[DISCUSSION] Dataset "layer" conventions in STAC (dashboard specific) #32

leothomas · 2022-02-25T20:58:23Z

Context/Background:

Dashboard Evolution has various datasets that the frontend (or more accurately, the configuration repo used by the frontend) is aware of. Some datasets have "layers" or "variations" but not all. The concept of a “layer” or variation (as opposed to a separate dataset) is that all layers of a dataset should have the same temporal domain and geometries and should all relate to the same underlying data “capture” event. The most classic example of this is spectral bands in satellite images. All bands are captured at the same time and place by a satellite, and all relate to the same time and place on earth, and provide variations of the same fundamental measured quantity. In the case of Dashboard Evolution, some datasets will have layers and others will not.

Dashboard Evolution specific examples of layers:

Atmospheric Datasets (NO2/CO2):

These datasets each have single monthly average value (avg), but also a month by month difference from baselines (the average of that same month from 2005 to 2015 - diff). EG: Jan 2020 vs. Jan 2020 - AVG[Jan 2015, …, Jan 2005])

CMIP6:

This is possibly the most complicated “layered” dataset, since the layers have a multi-dimensional/multi-tiered hierarchy.

Each variable has been evaluate at daily values by a dozen or so ensemble models. To reduce granularity (for user experience) the Ames researchers have provided ensemble models, which average across all the models, over the entire month. There are 3 such ensemble models:

CMIP6_ensemble_median/
CMIP6_ensemble_p10/
CMIP6_ensemble_p90/

Each model has been generated in 2 different scenarios, called Shared Socio-economic Pathways, SSP (there may be more coming):

ssp245 # extreme corrective action taken to combat climate change
ssp585 # "business as usual" scenario

In this case, the layers of the CMIP6 dataset are a convolution of the SSP’s and the ensemble models (eg: ensemble_media-ssp245, ensemble_median-ssp585, ensemble_p10-ssp245, etc )

Nightlights:

Nightlights dataset is an example of a dataset that has no layers

Options Considered:

Each "layer" as a different asset of the same STAC Item

/collections/co2/items
>>> { "datetime": "...", "bbox": [...], "assets":{ "avg":{...}, "diff":{...} } }

This seems like the most “correct” approach in the sense that we’ve based our idea of “layers” on the idea of spectral bands, and the official Asset documentation uses spectral bands as an example:

Item has a multispectral analytic asset, a 3-band full resolution visual asset, a down-sampled preview asset, and a cloud mask asset

src: https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#asset-roles

Each "layer" as it's own STAC collection

/collections
>>> co2-avg, co2-diff, nightlights, cmip6-ssp245-median-ensemble, cmip6-ssp585-median-ensemble, cmip6-ssp245-p90-ensemble, cmip6-ssp585-p90-ensemble

Each "layer" as custom (filterable) property of the STAC Item

/collections/co2/items
>>> {"datetime": "...", "bbox":"...", "properties": {"custom:layer-name": "avg"}}

This approach is the most well suited to the CMIP6 dataset since it enables all models/ssps to live in the same collection and new models/ssps can be easily added as new items (ref: https://planetarycomputer.microsoft.com/dataset/nasa-nex-gddp-cmip6#Example-Notebook)
Note: The above example considers each of the 9 variables as assets of each STAC Item. While each of the 9 variables has the same temporal and geographic domain, they don’t necessarily relate to the same underlying data. I’m not sure how much of a departure this is from the intended usage of STAC assets.

Discussion:

While option 1 seems the most “correct”, I’m concerned about the case of having to add a new layer to an existing dataset, which would entail having to update the assets key of each STAC item in the database, something for which we don’t yet have a functionality. Further complications come with recurring ingests (eg: the ingestion pipeline would need logic to find a STAC record to add an asset to or create it if it doesn’t yet exist).

Option 3 is the best adapted to the CMIP6 dataset, however it fails against an important constraint: the dashboard needs to have access to the date domain of each layer. With option 3 this would force the dashboard to have to make a paginated query against the STAC api (with filters corresponding to the desired layer) and then extract the date object from each result.

@Alexandra K has already worked on an implementation for a custom date domain query by adding a domain key to the collection level summary - this requires each layer to be its own collection.

Decision:

For the time being we will go with option 2, and ingest each “layer” as a separate collection. This is create a large number of collections (especially in the case of CMIP6) which may negatively affect discoverability through the STAC API. We can mitigate this with custom datasets collections that collect all layer level collections for the sake of discoverability (for usage outside of the raster API).

Perhaps at a later date, once the datasets to ingest have stabilized we can consider switching to an asset based model (option 1)

The text was updated successfully, but these errors were encountered:

This was referenced Feb 25, 2022

STAC Item Creation Conventions (Dashboard Specific) #28

Closed

STAC Collection Creation Conventions (Dashboard Specific) #29

Closed

anayeaye mentioned this issue Mar 24, 2022

Add blue tarp detection layer NASA-IMPACT/veda-data-pipelines#79

Closed

anayeaye mentioned this issue Apr 4, 2022

Add SEDAC Social Vulnerability Index Dataset NASA-IMPACT/veda-data-pipelines#82

Closed

abarciauskas-bgse added the architecture label Apr 6, 2022

xhagrg assigned anayeaye May 23, 2022

anayeaye assigned leothomas and unassigned anayeaye May 24, 2022

This was referenced Oct 14, 2022

CMIP6 COGs NASA-IMPACT/veda-data-pipelines#191

Closed

Add a test set of CMIP6 datasets NASA-IMPACT/veda-data-pipelines#204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] Dataset "layer" conventions in STAC (dashboard specific) #32

[DISCUSSION] Dataset "layer" conventions in STAC (dashboard specific) #32

leothomas commented Feb 25, 2022 •

edited

Loading

[DISCUSSION] Dataset "layer" conventions in STAC (dashboard specific) #32

[DISCUSSION] Dataset "layer" conventions in STAC (dashboard specific) #32

Comments

leothomas commented Feb 25, 2022 • edited Loading

Context/Background:

Dashboard Evolution specific examples of layers:

Atmospheric Datasets (NO2/CO2):

CMIP6:

Nightlights:

Options Considered:

Discussion:

Decision:

leothomas commented Feb 25, 2022 •

edited

Loading