Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Incorporate github based dataset metadata workflow #135

Open
leothomas opened this issue Sep 9, 2021 · 1 comment
Open

EPIC: Incorporate github based dataset metadata workflow #135

leothomas opened this issue Sep 9, 2021 · 1 comment
Assignees

Comments

@leothomas
Copy link
Contributor

See: https://github.com/NASA-IMPACT/dashboard-api-starter

@leothomas
Copy link
Contributor Author

[IN PROGRESS]

Context:

Metadata files contain information about how a dataset should be displayed (legend stops, color map, rescale, etc) as well as where to find the COGs in S3, and the dates available for each dataset. Originally all of this information was contained in JSON files stored in the dashboard (frontend) code repository. In order to avoid having to manually update available dates and re-deploy the dashboard with each new data delivery, the datasets' domains (available dates) generation was moved to a backend process. A /datasets endpoint was created which would, for each dataset, scan the S3 bucket to collect all available files and extract and return the date from each. The metadata files themselves were also moved to the backend.

Due to a growing number of data files and the way the dashboard would query available dates for each dataset individually, the /datasets endpoint's response was becoming too slow. The dataset domain (available dates) generation process was moved to a lambda function that would run once every 24hrs and store the available dates in a JSON file in the same S3 bucket as the rest of the data. The /datasets endpoint now simply reads from this JSON file, and the response time is no longer affected by the number of data files in S3.

When re-thinking the structure of the API for the EO Lab-in-a-box project, @abarciauskas-bgse had the idea to move the dashboard dataset metadata files to a separate github repository. This is a great idea as it keeps the ability to version datatsets and open feature branches when integrating new datasets, without requiring knowledge of the code base. It also gives us the ability use github actions to generate dataset domains when opening or merging PRs.

I'd like to integrate this in the covid dashboard API's workflow - in order to make it easier to visualize and validate datasets in the dashboard without having the deploy the data to production.

Proposed workflow:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant