You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dashboard Evolution has various datasets that the frontend (or more accurately, the configuration repo used by the frontend) is aware of. Some datasets have "layers" or "variations" but not all. The concept of a “layer” or variation (as opposed to a separate dataset) is that all layers of a dataset should have the same temporal domain and geometries and should all relate to the same underlying data “capture” event. The most classic example of this is spectral bands in satellite images. All bands are captured at the same time and place by a satellite, and all relate to the same time and place on earth, and provide variations of the same fundamental measured quantity. In the case of Dashboard Evolution, some datasets will have layers and others will not.
Dashboard Evolution specific examples of layers:
Atmospheric Datasets (NO2/CO2):
These datasets each have single monthly average value (avg), but also a month by month difference from baselines (the average of that same month from 2005 to 2015 - diff). EG: Jan 2020 vs. Jan 2020 - AVG[Jan 2015, …, Jan 2005])
CMIP6:
This is possibly the most complicated “layered” dataset, since the layers have a multi-dimensional/multi-tiered hierarchy.
Each variable has been evaluate at daily values by a dozen or so ensemble models. To reduce granularity (for user experience) the Ames researchers have provided ensemble models, which average across all the models, over the entire month. There are 3 such ensemble models:
Each model has been generated in 2 different scenarios, called Shared Socio-economic Pathways, SSP (there may be more coming):
ssp245 # extreme corrective action taken to combat climate change
ssp585 # "business as usual" scenario
In this case, the layers of the CMIP6 dataset are a convolution of the SSP’s and the ensemble models (eg: ensemble_media-ssp245, ensemble_median-ssp585, ensemble_p10-ssp245, etc )
Nightlights:
Nightlights dataset is an example of a dataset that has no layers
Options Considered:
Each "layer" as a different asset of the same STAC Item
This seems like the most “correct” approach in the sense that we’ve based our idea of “layers” on the idea of spectral bands, and the official Asset documentation uses spectral bands as an example:
Item has a multispectral analytic asset, a 3-band full resolution visual asset, a down-sampled preview asset, and a cloud mask asset
This approach is the most well suited to the CMIP6 dataset since it enables all models/ssps to live in the same collection and new models/ssps can be easily added as new items (ref: https://planetarycomputer.microsoft.com/dataset/nasa-nex-gddp-cmip6#Example-Notebook)
Note: The above example considers each of the 9 variables as assets of each STAC Item. While each of the 9 variables has the same temporal and geographic domain, they don’t necessarily relate to the same underlying data. I’m not sure how much of a departure this is from the intended usage of STAC assets.
Discussion:
While option 1 seems the most “correct”, I’m concerned about the case of having to add a new layer to an existing dataset, which would entail having to update the assets key of each STAC item in the database, something for which we don’t yet have a functionality. Further complications come with recurring ingests (eg: the ingestion pipeline would need logic to find a STAC record to add an asset to or create it if it doesn’t yet exist).
Option 3 is the best adapted to the CMIP6 dataset, however it fails against an important constraint: the dashboard needs to have access to the date domain of each layer. With option 3 this would force the dashboard to have to make a paginated query against the STAC api (with filters corresponding to the desired layer) and then extract the date object from each result.
@Alexandra K has already worked on an implementation for a custom date domain query by adding a domain key to the collection level summary - this requires each layer to be its own collection.
Decision:
For the time being we will go with option 2, and ingest each “layer” as a separate collection. This is create a large number of collections (especially in the case of CMIP6) which may negatively affect discoverability through the STAC API. We can mitigate this with custom datasets collections that collect all layer level collections for the sake of discoverability (for usage outside of the raster API).
Perhaps at a later date, once the datasets to ingest have stabilized we can consider switching to an asset based model (option 1)
The text was updated successfully, but these errors were encountered:
Context/Background:
Dashboard Evolution has various datasets that the frontend (or more accurately, the configuration repo used by the frontend) is aware of. Some datasets have "layers" or "variations" but not all. The concept of a “layer” or variation (as opposed to a separate dataset) is that all layers of a dataset should have the same temporal domain and geometries and should all relate to the same underlying data “capture” event. The most classic example of this is spectral bands in satellite images. All bands are captured at the same time and place by a satellite, and all relate to the same time and place on earth, and provide variations of the same fundamental measured quantity. In the case of Dashboard Evolution, some datasets will have layers and others will not.
Dashboard Evolution specific examples of layers:
Atmospheric Datasets (NO2/CO2):
These datasets each have single monthly average value (
avg
), but also a month by month difference from baselines (the average of that same month from 2005 to 2015 -diff
). EG:Jan 2020
vs.Jan 2020 - AVG[Jan 2015, …, Jan 2005]
)CMIP6:
This is possibly the most complicated “layered” dataset, since the layers have a multi-dimensional/multi-tiered hierarchy.
Each variable has been evaluate at daily values by a dozen or so ensemble models. To reduce granularity (for user experience) the Ames researchers have provided ensemble models, which average across all the models, over the entire month. There are 3 such ensemble models:
Each model has been generated in 2 different scenarios, called Shared Socio-economic Pathways, SSP (there may be more coming):
In this case, the layers of the CMIP6 dataset are a convolution of the SSP’s and the ensemble models (eg:
ensemble_media-ssp245, ensemble_median-ssp585, ensemble_p10-ssp245, etc
)Nightlights:
Nightlights dataset is an example of a dataset that has no layers
Options Considered:
This seems like the most “correct” approach in the sense that we’ve based our idea of “layers” on the idea of spectral bands, and the official
Asset
documentation uses spectral bands as an example:src: https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#asset-roles
This approach is the most well suited to the CMIP6 dataset since it enables all models/ssps to live in the same collection and new models/ssps can be easily added as new items (ref: https://planetarycomputer.microsoft.com/dataset/nasa-nex-gddp-cmip6#Example-Notebook)
Note: The above example considers each of the 9 variables as assets of each STAC Item. While each of the 9 variables has the same temporal and geographic domain, they don’t necessarily relate to the same underlying data. I’m not sure how much of a departure this is from the intended usage of STAC assets.
Discussion:
While option 1 seems the most “correct”, I’m concerned about the case of having to add a new layer to an existing dataset, which would entail having to update the
assets
key of each STAC item in the database, something for which we don’t yet have a functionality. Further complications come with recurring ingests (eg: the ingestion pipeline would need logic to find a STAC record to add an asset to or create it if it doesn’t yet exist).Option 3 is the best adapted to the CMIP6 dataset, however it fails against an important constraint: the dashboard needs to have access to the date domain of each layer. With option 3 this would force the dashboard to have to make a paginated query against the STAC api (with filters corresponding to the desired layer) and then extract the date object from each result.
@Alexandra K has already worked on an implementation for a custom date domain query by adding a domain key to the collection level summary - this requires each layer to be its own collection.
Decision:
For the time being we will go with option 2, and ingest each “layer” as a separate collection. This is create a large number of collections (especially in the case of CMIP6) which may negatively affect discoverability through the STAC API. We can mitigate this with custom datasets collections that collect all layer level collections for the sake of discoverability (for usage outside of the raster API).
Perhaps at a later date, once the datasets to ingest have stabilized we can consider switching to an asset based model (option 1)
The text was updated successfully, but these errors were encountered: