Add maap biomass datasets to the API (high-level steps) #76

abarciauskas-bgse · 2022-04-10T21:21:18Z

For each dataset, we will follow the following steps:

Identify the dataset and what the processing needs are

Identify dataset and where it will be accessed from.

Datasets in https://earthdata.nasa.gov/maap-biomass/ need to be published to the VEDA API so that users, such as the trilateral dashboard, can access them. We will also need these datasets in the NASA dashboard once we are ready to publish the biomass story.

These datasets are in https://github.com/MAAP-Project/biomass-dashboard-datasets/tree/main/datasets, presumably we want to publish all of those datasets to the staging API, but we should probably cross check with the biomass story being told for the trilateral dashboard and focus on the ones we are aware of.

Datasets were often required to be uploaded to a landing zone location, but we will have to go through each one to identify its location in the MAAP buckets, I believe most of them are in s3://maap-landing-zone-gccops/user-added/uploaded_objects/

Question for the group will be if we want to copy the files to our "VEDA" bucket. Who should be able to access these files?

If the dataset is ongoing (i.e. new files are continuously added and should be included in the dashboard), design and construct the forward-processing workflow.
- Each collection will have a workflow which includes discovering data files from the source, generating the cloud-optimized versions of the data and writing STAC metadata.
- Each collection will have different requirements for both the generation and scheduling of these steps, so a design step much be included for each new collection / data layer.

Design the metadata and publish to the Dev API

Review conventions for generating STAC collection and item metadata:
- Collections: STAC Collection Creation Conventions (Dashboard Specific) veda-backend#29 and STAC version 1.0 specification for collections
- Items: STAC Item Creation Conventions (Dashboard Specific) veda-backend#28 and STAC version 1.0 specification for items
- NOTE: The delta-backend instructions are specific to datasets for the climate dashboard, however not all datasets are going to be a part of the visual layers for the dashboard so I believe you can ignore the instructions that are specific to "dashboard" extension, "item_assets" in the collection and "cog_default" asset type in the item
After reviewing the STAC documentation for collections and items and reviewing existing scripts for generating collection metadata (generally with SQL) and item metadata, generate or reuse scripts for your collection and a few items to publish to the testing API. There is some documentation and examples for how to generate a pipeline or otherwise document your dataset workflow in https://github.com/NASA-IMPACT/cloud-optimized-data-pipelines. We would like to maintain the scripts folks are using to publish datasets in that repo so we can easily re-run those datasets ingest and publish workflows if necessary.
If necessary, request access and credentials to the dev database and ingest and publish to the Dev API. Submit a PR with the manual or CDK scripts used to run the workflow to publish to the Dev API and include links to the published datasets in the Dev API

Publish to the Staging API

Once the PR is approved, we can merge and publish those datasets to the Staging API

The text was updated successfully, but these errors were encountered:

abarciauskas-bgse · 2022-04-25T21:30:23Z

j08lue · 2024-04-09T07:36:37Z

Stale

abarciauskas-bgse assigned slesaad and abarciauskas-bgse Apr 18, 2022

abarciauskas-bgse mentioned this issue May 3, 2022

Add NCEO Africa dataset to the API NASA-IMPACT/veda-data-pipelines#128

Closed

6 tasks

gadomski transferred this issue from NASA-IMPACT/veda-data-pipelines Sep 22, 2023

j08lue closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add maap biomass datasets to the API (high-level steps) #76

Add maap biomass datasets to the API (high-level steps) #76

abarciauskas-bgse commented Apr 10, 2022

abarciauskas-bgse commented Apr 25, 2022 •

edited

Loading

j08lue commented Apr 9, 2024

Add maap biomass datasets to the API (high-level steps) #76

Add maap biomass datasets to the API (high-level steps) #76

Comments

abarciauskas-bgse commented Apr 10, 2022

Identify the dataset and what the processing needs are

Design the metadata and publish to the Dev API

Publish to the Staging API

abarciauskas-bgse commented Apr 25, 2022 • edited Loading

Next steps

For NCEO Africa 2017

For GEDI Gridded Biomass L4B

CCI BIOMASS

ICESat-2 Boreal 2020

NASA JPL 2020

j08lue commented Apr 9, 2024

abarciauskas-bgse commented Apr 25, 2022 •

edited

Loading