You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.
From @leothomas :
The CMPI6 dataset spans a massive temporal range (1950 - 2100) compared to most other datasets (~2000 ~2030), so it required a fundamental shift in the way PGSTAC (the database) is organized. See below for a more detailed description of the required change.
Alex was working on integrating the PGSTAC v0.5 to the VEDA backend ton enable ingestion of the CMIP6 dataset, but the very modification that enable ingesting CMIP6 imply having to update/re-ingest all of the other datasets.
CMIP6 alone is pretty massive (3.6 Tb), so we were holding off on re-ingesting everything until we have deployed the backend to MCP (since we'll have to re-ingest everything there anyways).
The COGs have been generated for the CMIP6 dataset, so we will just have to generate STAC records for them and copy them over
Background info on why we need to stand up a new database in order to integrate the changes necessary to ingest CMIP6:
PGSTAC has a partition on commonly searched fields (date, geometry, etc) to make the database more performant.
Up until v0.5, PGSTAC organized everything into weekly partitions, which was great, since most datasets had a temporal range of no more than ~20 years (20 * 52 = 1040 partitions). Along comes CMIP6 with a 150 year temporal range, which forces the database to create 150 * 52 = 7800 partitions, which crashes the database.
To solve this, David Bitner did a huge refactor of the database to allow partitioning first on collections, and then optionally on time, with the option to partition by year or month.
Once partitions have been created in a database, you can't really delete them, so the solution is to deploy a new database, and then re-ingest everything.
Tasks
Upgrade database in lower level environment
Test adding CMIP6 to new database + test other datasets still work as expected
Acceptance criteria
CMIP6 data available in the staging API
Subsequent tickets
CMIP story in new dashboard
The text was updated successfully, but these errors were encountered:
Description
From @leothomas :
The CMPI6 dataset spans a massive temporal range (1950 - 2100) compared to most other datasets (~2000 ~2030), so it required a fundamental shift in the way PGSTAC (the database) is organized. See below for a more detailed description of the required change.
Alex was working on integrating the PGSTAC v0.5 to the VEDA backend ton enable ingestion of the CMIP6 dataset, but the very modification that enable ingesting CMIP6 imply having to update/re-ingest all of the other datasets.
CMIP6 alone is pretty massive (3.6 Tb), so we were holding off on re-ingesting everything until we have deployed the backend to MCP (since we'll have to re-ingest everything there anyways).
The COGs have been generated for the CMIP6 dataset, so we will just have to generate STAC records for them and copy them over
Background info on why we need to stand up a new database in order to integrate the changes necessary to ingest CMIP6:
PGSTAC has a partition on commonly searched fields (date, geometry, etc) to make the database more performant.
Up until v0.5, PGSTAC organized everything into weekly partitions, which was great, since most datasets had a temporal range of no more than ~20 years (20 * 52 = 1040 partitions). Along comes CMIP6 with a 150 year temporal range, which forces the database to create 150 * 52 = 7800 partitions, which crashes the database.
To solve this, David Bitner did a huge refactor of the database to allow partitioning first on collections, and then optionally on time, with the option to partition by year or month.
Once partitions have been created in a database, you can't really delete them, so the solution is to deploy a new database, and then re-ingest everything.
Tasks
Acceptance criteria
CMIP6 data available in the staging API
Subsequent tickets
The text was updated successfully, but these errors were encountered: