STAC Collection Creation Conventions (Dashboard Specific) #29

anayeaye · 2022-02-24T21:46:00Z

Dashboard-specific notes that supplement the full stac-api collection specification. Note that there is no schema enforcement on the collection table content in pgstac—this provides flexibility but also requires caution when creating and modifying Collections.

STATUS: revised with review comments [2022-03-20]

STATUS: under review [2022-02-24]

Collection field, extension, and naming recommendations

Field &/or Extension	Recommendations
id	If dataset exists in NASA's Earthdata or presumably from some other data provider like ESA, use that ID. If appropriate, add a suffix for any additional processing that has been performed, e.g. "OMSO2PCA_cog". If dataset is not from NASA's Earthdata, we can use a human readable name with underscores like "facebook_population_density".
dashboard extension	To support the delta-ui we have added two new fields in a proposed dashboard extension. For now we are just adding the fields but after testing things out, we can formalize the extension with a hosted json schema. *Dashboard extension properties are only required for collections that will be viewed in the delta-ui dashboard.*
dashboard:is_periodic	`True/False` This boolean is used when summarizing the collection—if the collection is periodic, the temporal range of the items in the collection and the time density are all the front end needs to generate a time picker. If the items in the collection are not periodic, a complete list of the unique item datetimes is needed.
dashboard:time_density	`year`, `month`, `day`, `hour`, `minute`, or `null`. These time steps should be treated as enum when the extension is formalized. For collections with a single time snapshot this value is null.
item_assets	stac-extension/item_assets is used to explain the assets that are provided for each item in the collection. We’re not providing thumbnails yet, but this example below includes a thumbnail asset to illustrate how the extension will be used. The population of this property is not automated, the creator of the collection writes the item assets documentation. *Item assets are only required for collections that will be viewed in the delta-ui dashboard.*
summaries	The implementation of this core stac-spec field is use-case specific. Our implementation is intended to support the dashboard and will supply datetime and raster statistics for the default map layer asset across the entire collection. *Currently summaries are manually updated with a delta-ui specific user defined function in pgstac.*
title and description	Use these properties to provide specific information about the collection to API users and catalog browsers. These properties correspond to dataset name and info in the covid-api but the delta dashboard will use delta-config to set these values in the UI so the information in our stac collections will be for data curators and API users.
collection name style choices	Prefer lower-case kebab-case collection names. Decision: Should names align with underlying data identifiers or should it be an interpreted name? `omi-trno2-dhrm` and `omi-trno2-dhrm-difference` vs `no2-monthly` and `no2-monthly-diff`; `bmhd-30m-monthly` vs `nightlights-hd-monthly`
license	SPDX license id, license is likely available in CMR but we may need to research other sources of data. Default open license: `CC0-1.0`

item_assets example


"item_assets": {
    "thumbnail": {
      "type": "image/jpeg",
      "roles": [
        "thumbnail"
      ],
      "title": "Thumbnail",
      "description": "A medium sized thumbnail"
    },
    "cog_default": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": [
        "data",
        "layer"
      ],
      "title": "Default COG Layer",
      "description": "Cloud optimized default layer to display on map"
    }
  }

summaries example for periodic collection

"summaries": {
    "datetime": ["2016-01-01T00:00:00Z", "2022-01-01T00:00:00Z"],
    "cog_default": {
      "max": 50064805976866820,
      "min": -6618294421291008
    }
  }

summaries example for non-periodic collection

"summaries": {
    "datetime": [
      "2020-01-01T00:00:00Z",
      "2020-02-01T00:00:00Z",
      "2020-03-01T00:00:00Z",
      "2020-04-01T00:00:00Z",
      "2020-05-01T00:00:00Z",
      "2020-06-01T00:00:00Z",
      "2020-07-01T00:00:00Z",
      "2020-08-01T00:00:00Z",
      "2020-09-01T00:00:00Z",
      "2020-10-01T00:00:00Z",
      "2020-11-01T00:00:00Z",
      "2020-12-01T00:00:00Z",
      "2021-01-01T00:00:00Z",
      "2021-02-01T00:00:00Z",
      "2021-03-01T00:00:00Z",
      "2021-04-01T00:00:00Z",
      "2021-05-01T00:00:00Z",
      "2021-06-01T00:00:00Z",
      "2021-07-01T00:00:00Z",
      "2021-08-01T00:00:00Z",
      "2021-09-01T00:00:00Z"
    ],
    "cog_default": {
      "max": 255,
      "min": 0
    }
  }

The text was updated successfully, but these errors were encountered:

anayeaye · 2022-02-25T21:26:06Z

#32

anayeaye · 2022-03-02T18:42:46Z

@abarciauskas-bgse @jvntf @slesaad Can you weigh in on these tickets for STAC metadata conventions in regards to the data ingests we are doing and point out anything that should be adjusted or added? We are definitely going to need to make adjustments for datetimes (start/end vs nominal datetime), anything else?

Collections STAC Collection Creation Conventions (Dashboard Specific) #29 (here)
Items STAC Item Creation Conventions (Dashboard Specific) #28

abarciauskas-bgse · 2022-03-09T21:38:38Z

This is really great @anayeaye your table and examples are 💯

Questions about naming of some of the fields:

Was there a precedent for time_density? What do you think about time_unit or time_period?
I'm wondering if cog_default is too specific and it should be something more generic like tiling_defaults. I think that information will be used to generate default rescaling right? I could imagine generating rescaling parameters where the source is something other than a COG

Additional questions about values in summaries:

How do we derive cog_default min/max/avg values? Do we ask the science teams or inspect all the files being ingested (or sample them if there are a sufficiently large number)?
Is datetime required if it is the same as the temporal interval?

Is this a valid example of the conventions you are proposing:

{
    "id": "OMSO2PCA",
    "type": "Collection",
    "links": [],
    "title": "OMSO2PCA", 
    "extent": {
        "spatial": {
            "bbox": [
                [
                    -180,
                    -90,
                    180,
                    90
                ]
            ]
        },
        "temporal": {
            "interval": [
                [
                    "2005-01-01T00:00:00Z",
                    "2021-01-01T00:00:00Z"
                ]
            ]
        }
    },
    "license": "MIT",
    "description": "OMI/Aura Sulfur Dioxide (SO2) Total Column L3 1 day Best Pixel in 0.25 degree x 0.25 degree V3",
    "stac_version": "1.0.0",
    "summaries": {
        "datetime": [
            "2005-01-01T00:00:00Z",
            "2021-01-01T00:00:00Z"
        ],
        "cog_default": {
            "avg": 287.90577560637,
            "max": 478.89999389648,
            "min": 51
        }
    },
    "properties": {
        "dashboard:is_periodic": true,
        "dashboard:time_density": "year"
    }    
}

anayeaye · 2022-03-10T01:04:22Z

@abarciauskas-bgse thanks digging in to this! To unblock UI development we did just settle on a few solutions that we could commit to deliver for the dashboard UI. I don't know that it too late to make changes but at this point it will impact the front end so we'd have to coordinate to not break anything.

Was there a precedent for time_density? What do you think about time_unit or time_period?

We did discuss other keys but since this dashboard extension is purely for the front end, UI got final vote on preferences.

--

I'm wondering if cog_default is too specific and it should be something more generic like tiling_defaults. I think that information will be used to generate default rescaling right? I could imagine generating rescaling parameters where the source is something other than a COG

Yeah, it is for rescaling parameters. This is an incremental solution that only supports simple products with single band COG assets. I think it needs to be somewhat specific but other asset keys might fit better (even just cog but at this point we would need to coordinate a change). I think that if we added non-COG assets that needed rescaling values, we'd add a new asset key and write a new function to derive the values (what is implemented here actually uses the stac raster extension and is COG specific).

Cog_default came out of the need to create a consistent asset key for the map tiler and to make it easier to automate the summary. This is supposed to be a catch all asset key for all of our basic single band dataset products. The cog_default doesn't cover collections like HLS will not have a default asset to display and we don't really want to be calculating full collection statistics for the reflectance data anyway. There is a running delta-config discussion describing how the UI handles these two types of collections differently--including how the setting for these map layers that require something more complex than a simple rescale.

--

How do we derive cog_default min/max/avg values? Do we ask the science teams or inspect all the files being ingested (or sample them if there are a sufficiently large number)?

We have a user defined function for pgstac--the evolution of the function is issue #31 and I am working on adding that function to our deployment in [PR 34](#43. It is not a perfect solution but the goal is to make a simple function call that could be the terminal step in an ingest pipeline (maybe a fan-in to a single pgstac function call that will dynamically create the summary). We're also creating an update all method that will update any collection that has the necessary dashboard metadata attributes that we might want to schedule to update regularly. It would be preferable if we could identify trigger events to run when needed for a given collection.

The latest iteration nixes the average because the way it is derived is not useful (min of mins is a valid metric; average of means is less so).

--

Is datetime required if it is the same as the temporal interval?

For now we have committed to maintaining this information in one place for the dashboard UI. But we intend to make the creating and updating of summaries hands-off.

--

Is this a valid example of the conventions you are proposing?

Yes but we will have a function to automatically generate the summaries for all of our non-spectral datasets if there is an item_assets property on the collection. Totally open to discussing that but for now the SQL routine looks at the item assets property to decide whether or not to create a cog_default summary, if not it will only create a datetime summary. And one nit: the license should be one of the predefined SPDX licenses because stac browsers will link to the the spdx license on an id lookup. But this does not impact any of our features so it's the kind of thing we'll probably want to circle back on when we have easier ways to edit the metadata.

abarciauskas-bgse · 2022-03-10T04:48:33Z

Thanks so much for all these detailed responses and apologies for my belated comments and that you may have had to repeat any information I should have been aware of. You obviously have thought through this solution comprehensively and developed some really cool functionality. I'm more than happy to implement the conventions as defined above and seek your review of all the metadata moving forward 🙇🏽‍♀️

abarciauskas-bgse · 2022-03-10T04:51:24Z

I also updated https://j2wlly6xg8.execute-api.us-east-1.amazonaws.com/collections/OMSO2PCA and the example above with the "MIT" as the license, given that's what you used for the other datasets.

abarciauskas-bgse · 2022-03-10T16:40:23Z

Decision about id's and titles:

If dataset exists in NASA's Earthdata or presumably from some other data provider like ESA, use that ID. If appropriate add an underscore for any additional processing that has been performed, e.g. "OMSO2PCA_cog"
If dataset is not from NASA's Earthdata, we can use a human readable name with underscores like "facebook_population_density"

@anayeaye I'm inclined to keep title and ID the same but do you know if there is a good use case where they should be different? Like dataset landing pages where the title might be a more descriptive name?

anayeaye · 2022-03-10T17:33:25Z

@abarciauskas-bgse

Either _ or - work but the hyphen is more consistent with other stac catalogs. This is style only and I'll use whatever we settle on.
I think having a title that is more descriptive than the id is going to be really helpful for new users when discovering data. CMR has a Title property that would be appropriate, we'd just need to append COG.
A third convention to decide on: when we pull datasets from the covid-19 dashboard: do we want to use the same ids as the source dashboard datasets? I'm inclined to make the stac collections as similar as possible to the covid-19 datasets.

jvntf · 2022-03-10T19:05:23Z

@anayeaye a small nit, should we add hour to time_density or is it left out for a reason?

abarciauskas-bgse · 2022-03-10T20:59:12Z

I suggested underscore because I find it more readable but don't have a strong preference. I would be interested to know if there is a reason other STAC catalogs are using dashes, perhaps there is some tool or database convention for which it is required to use dashes instead of underscores. But again, I don't have a strong preference.
👍🏽
I haven't inspected each dataset json file but the filenames in https://github.com/NASA-IMPACT/covid-api/tree/develop/covid_api/db/static/datasets all look reasonable to me

I think my takeaways with respect to id and title at this point are:

If the dataset has been sourced from another data archive, such as OMI from Earthdata, the id of that dataset should be re-used somewhere in the id or title or both
If the dataset has been sourced from another data archive, but changed in any way this should also be transparent in the title (such as COG)
Where it makes sense, make both the title and id as human-readable and descriptive as is reasonable, but the id should be a (much) abbreviated identifier with dashes in places of spaces.

@anayeaye what do you think with that summary ⬆️

abarciauskas-bgse · 2022-03-10T21:03:56Z

Question on summaries: if we're going to implement summaries in a terminal step, should we be adding them at all right now while we are creating the collections? I'm going to leave summaries out for now, assuming we can run the summaries function after ingest.

abarciauskas-bgse · 2022-03-10T21:23:50Z

One more thought about naming: s3 data directories should match the ids of the datasets. It would be nice to enforce this in the future but for now just stating it for the group.

anayeaye · 2022-03-11T00:52:04Z

@jvntf - totally missed hour. Definitely adding that in the edit.

anayeaye · 2022-11-10T17:01:46Z

Creative Commons Zero licensing recommendation
We may want to add some additional guidance on choosing the correct license when not provided explicitly with the data to future data curator documentation so I'm recording some notes here.

Still true: choose a SPDX license id license id or use proprietary (STAC community tools are built to link to SPDX license when an id is provided in the metadata).

Snippets/links from discussion with about choosing a license

Creative Commons Zero is recommended by data.gov: https://resources.data.gov/open-licenses/
CMIP6 data are using Creative Commons Zero (CC0-1.0) in the AWS registry
A thread discussing MIT vs CC0 https://news.ycombinator.com/item?id=11398770

cc: @slesaad @j08lue @ashiklom

j08lue · 2022-11-10T21:47:44Z

Thanks for recording this here, @anayeaye. I take it from our discussion on Slack that the decision was made in favor of CC0-1.0. Where do we need to document this?

slesaad · 2022-11-11T15:25:10Z

@j08lue probably edit the first post in this thread and specify it there

j08lue · 2022-11-12T12:52:19Z

Done.

j08lue · 2023-06-12T10:59:14Z

These docs are now published at https://nasa-impact.github.io/veda-docs/contributing/dataset-ingestion/stac-collection-conventions.html

Perhaps this issue can now be closed and we in the future maintain this information in the docs site?

jsignell · 2023-06-12T15:01:15Z

I'm going to close and lock this issue.

anayeaye added the question Further information is requested label Feb 25, 2022

This was referenced Mar 9, 2022

Update metadata and ingest annual SO2 and NO2 datasets to staging API NASA-IMPACT/veda-data-pipelines#94

Closed

Ingest HLS subsets into the staging API NASA-IMPACT/veda-data-pipelines#95

Closed

jvntf mentioned this issue Mar 10, 2022

Update HLS collection sql script NASA-IMPACT/veda-data-pipelines#96

Closed

This was referenced Mar 16, 2022

Add facebook collection and item to staging API NASA-IMPACT/veda-data-pipelines#104

Closed

Set guidelines for data ingestion with VEDA engineers NASA-IMPACT/veda-documentation#2

Closed

This was referenced Mar 27, 2022

Add a ICESat-2 L4 Monthly Gridded Sea Ice Thickness, Version 1 to the API NASA-IMPACT/veda-data-pipelines#108

Closed

Add a Ocean NPP dataset to the API (high-level steps) NASA-IMPACT/veda-data-pipelines#110

Closed

abarciauskas-bgse mentioned this issue Sep 22, 2023

Add maap biomass datasets to the API (high-level steps) NASA-IMPACT/veda-data#76

Closed

abarciauskas-bgse mentioned this issue Sep 22, 2023

Add extended black marble nightlights data to the API (high-level steps) NASA-IMPACT/veda-data#75

Open

This was referenced May 1, 2022

Add VIIRS nightlights dataset to the API NASA-IMPACT/veda-data-pipelines#126

Closed

Add GEOGLAM dataset to the API NASA-IMPACT/veda-data-pipelines#127

Closed

Add NCEO Africa dataset to the API NASA-IMPACT/veda-data-pipelines#128

Closed

abarciauskas-bgse mentioned this issue Sep 22, 2023

Add GEDI L4B dataset to the API NASA-IMPACT/veda-data#74

Closed

This was referenced Sep 22, 2023

Add CCI Biomass product from 2010, 2017 and 2018 to the VEDA STAC API (high-level steps) NASA-IMPACT/veda-data#72

Open

Create COGs and publish LIS dataset to the API NASA-IMPACT/veda-data-pipelines#144

Closed

abarciauskas-bgse mentioned this issue Jul 15, 2022

Add a new dataset to the API (high-level steps) NASA-IMPACT/veda-data-pipelines#157

Closed

anayeaye mentioned this issue Oct 19, 2022

Publish EIS Annual Fire-Hydro COGs NASA-IMPACT/veda-data-pipelines#206

Closed

2 tasks

This was referenced Nov 1, 2022

Add Caldor fire data for the upcoming EIS-related Discovery story NASA-IMPACT/veda-data-pipelines#216

Closed

Add ECCO ocean state estimate NASA-IMPACT/veda-data-pipelines#217

Closed

j08lue mentioned this issue Nov 16, 2022

Add Future Snow Projections datasets NASA-IMPACT/veda-data-pipelines#237

Closed

11 tasks

TimLahmers mentioned this issue Sep 22, 2023

Add new LIS-based Western US Datasets NASA-IMPACT/veda-data#66

Open

10 tasks

j08lue mentioned this issue Feb 1, 2023

Add data for cryosphere story NASA-IMPACT/veda-data-pipelines#279

Closed

10 tasks

smohiudd mentioned this issue Feb 9, 2023

Add COVID Recovery Proxy Map dataset NASA-IMPACT/veda-data-pipelines#282

Closed

3 tasks

j08lue mentioned this issue Mar 1, 2023

Add Fire Discovery datasets on Thomas Fire NASA-IMPACT/veda-data-pipelines#305

Closed

10 tasks

aboydnw mentioned this issue May 2, 2023

Add GST CO2 dataset NASA-IMPACT/veda-data-pipelines#346

Closed

10 tasks

This was referenced May 23, 2023

Documentation for adding datasets NASA-IMPACT/veda-data#3

Closed

Transfer docs on metadata for collections and items NASA-IMPACT/veda-data#4

Closed

j08lue mentioned this issue May 26, 2023

Add docs for content contribution and tidy up table of contents NASA-IMPACT/veda-docs#60

Merged

j08lue mentioned this issue Jun 12, 2023

Add stac collection and item conventions NASA-IMPACT/veda-docs#64

Merged

jsignell closed this as completed Jun 12, 2023

NASA-IMPACT locked as resolved and limited conversation to collaborators Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAC Collection Creation Conventions (Dashboard Specific) #29

STAC Collection Creation Conventions (Dashboard Specific) #29

anayeaye commented Feb 24, 2022 •

edited by j08lue

Loading

anayeaye commented Feb 25, 2022

anayeaye commented Mar 2, 2022

abarciauskas-bgse commented Mar 9, 2022 •

edited

Loading

anayeaye commented Mar 10, 2022 •

edited

Loading

abarciauskas-bgse commented Mar 10, 2022

abarciauskas-bgse commented Mar 10, 2022

abarciauskas-bgse commented Mar 10, 2022

anayeaye commented Mar 10, 2022

jvntf commented Mar 10, 2022

abarciauskas-bgse commented Mar 10, 2022 •

edited

Loading

abarciauskas-bgse commented Mar 10, 2022 •

edited

Loading

abarciauskas-bgse commented Mar 10, 2022

anayeaye commented Mar 11, 2022

anayeaye commented Nov 10, 2022

j08lue commented Nov 10, 2022 •

edited

Loading

slesaad commented Nov 11, 2022

j08lue commented Nov 12, 2022

j08lue commented Jun 12, 2023

jsignell commented Jun 12, 2023

STAC Collection Creation Conventions (Dashboard Specific) #29

STAC Collection Creation Conventions (Dashboard Specific) #29

Comments

anayeaye commented Feb 24, 2022 • edited by j08lue Loading

Collection field, extension, and naming recommendations

anayeaye commented Feb 25, 2022

anayeaye commented Mar 2, 2022

abarciauskas-bgse commented Mar 9, 2022 • edited Loading

anayeaye commented Mar 10, 2022 • edited Loading

abarciauskas-bgse commented Mar 10, 2022

abarciauskas-bgse commented Mar 10, 2022

abarciauskas-bgse commented Mar 10, 2022

anayeaye commented Mar 10, 2022

jvntf commented Mar 10, 2022

abarciauskas-bgse commented Mar 10, 2022 • edited Loading

abarciauskas-bgse commented Mar 10, 2022 • edited Loading

abarciauskas-bgse commented Mar 10, 2022

anayeaye commented Mar 11, 2022

anayeaye commented Nov 10, 2022

j08lue commented Nov 10, 2022 • edited Loading

slesaad commented Nov 11, 2022

j08lue commented Nov 12, 2022

j08lue commented Jun 12, 2023

jsignell commented Jun 12, 2023

anayeaye commented Feb 24, 2022 •

edited by j08lue

Loading

abarciauskas-bgse commented Mar 9, 2022 •

edited

Loading

anayeaye commented Mar 10, 2022 •

edited

Loading

abarciauskas-bgse commented Mar 10, 2022 •

edited

Loading

abarciauskas-bgse commented Mar 10, 2022 •

edited

Loading

j08lue commented Nov 10, 2022 •

edited

Loading