Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate options for bulk loading STAC records #136

Open
batpad opened this issue Feb 7, 2025 · 0 comments
Open

Investigate options for bulk loading STAC records #136

batpad opened this issue Feb 7, 2025 · 0 comments

Comments

@batpad
Copy link

batpad commented Feb 7, 2025

I see that there is a fair bit of complexity being added to the ETL code to handle chunking and batching the loading of STAC records into the pg-stac db: https://github.com/IFRCGo/montandon-etl/blob/develop/apps/etl/load/sources/base.py#L37

I can see that this is required since we are using the POST endpoint on stac-api to add individual STAC records: https://github.com/IFRCGo/montandon-etl/blob/develop/apps/etl/load/sources/base.py#L11

I do think this is likely the best approach to go with, but just opening this ticket to see if there's better ways to bulk load records into STAC, saving us having to do this chunk management on the ETL side.

For bulk loading items into STAC, there are very efficient ways to do this if we talk to the stac database directly. I don't know if there's a good way to bulk load using the HTTP API.

It would be technically possible to connect directly the the pg-stac db from the ETL process and use the pgstac bulk load methods. But I can see that it seems nicer and cleaner to use the HTTP API to add records.

Just opening this ticket to evaluate our options - I think the current approach is also likely fine.

cc @emmanuelmathot @samshara @subinasr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant