This guide provides information on how VEDA runs data ingest, transformation and metadata (STAC) publication workflows via AWS Services, such as step functions.
NOTE: Since collection ingest still requires calling the database from a local machine, users must add their IP to an inbound rule on the security group attached to the RDS instance.
The collections/
directory holds json files representing the data for VEDA collection metadata (STAC).
Should follow the following format:
{
"id": "<collection-id>",
"type": "Collection",
"links":[
],
"title":"<collection-title>",
"extent":{
"spatial":{
"bbox":[
[
"<min-longitude>",
"<min-latitude>",
"<max-longitude>",
"<max-latitude>",
]
]
},
"temporal":{
"interval":[
[
"<start-date>",
"<end-date>",
]
]
}
},
"license":"MIT",
"description": "<collection-description>",
"stac_version": "1.0.0",
"dashboard:is_periodic": "<true/false>",
"dashboard:time_density": "<month/>day/year>",
"item_assets": {
"cog_default": {
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"roles": [
"data",
"layer"
],
"title": "Default COG Layer",
"description": "Cloud optimized default layer to display on map"
}
}
}
The items/
directory holds json files representing the airflow inputs for initiating discovery, ingest and publication workflows for ingesting items.
Can either be a single input event or a list of input events.
Should follow the following format:
{
"collection": "<collection-id>",
## for s3 discovery
"prefix": "<s3-key-prefix>",
"bucket": "<s3-bucket>",
"filename_regex": "<filename-regex>",
"datetime_range": "<month/day/year>",
### misc
"dry_run": "<true/false>",
}
Install dependencies:
poetry install
Create a collection json file in the data/collections/
directory. For format, check the data section.
./scripts/collection.sh .env <event-json-start-pattern>
Create an input json file in the data/items/
directory. For format, check the data section.
./scripts/item.sh .env <event-json-start-pattern>
Discovers all the files in an S3 bucket, based on the prefix and filename regex.
Discovers all the files in a CMR collection, based on the version, temporal, bounding box, and include. Returns objects that follow the specified criteria.
Converts the input file to a COG file, writes it to S3, and returns the S3 key.
Copies the data to the VEDA MCP bucket if necessary.
Given an object received from the STAC_READY_QUEUE
, builds a STAC Item, writes it to S3, and returns the S3 key.
Submits STAC items to STAC Ingestor system via POST requests.
Discovers all the files that need to be ingested either from s3 or cmr.
Converts the input files to COGs, runs in parallel.
Publishes the item to the STAC database (and MCP bucket if necessary).