Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate concept of GeoParquet in catalog generation #19

Open
santilland opened this issue Sep 10, 2024 · 0 comments
Open

Integrate concept of GeoParquet in catalog generation #19

santilland opened this issue Sep 10, 2024 · 0 comments

Comments

@santilland
Copy link
Contributor

santilland commented Sep 10, 2024

We have multiple (nearly) static collections that are being regenerated which are actually quite large, which is taking quite long and creating many items.
It would be great if we could make use of this, i think there are multiple potential approaches.

  1. Potentially the most straight forward one is for large time series collections instead of creating items individual items, we create one geoparquet. The caveats i foresee is:
  • the bubbled up information would not be available so we would need a dedicated handler in eodash to fetch important information from geoparquet (not sure if it is possible to extract only one property?)
  • how would we stop pytac from actually trying to save results when saving the catalog?
    • maybe we can "manually" create the geoparquet and add the links reference and don't actually add the items to the collection
  1. Other integration would be to somehow be able to set if geoparquet should be used as output format, potential caveats:
  • should we consider an update logic approach? e.g. if file is already present only update if update flag?
  • (same issues in the client as in 1)

Any other considerations? Let's brainstorm!
Inputs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant