Investigate options for bulk loading STAC records #136

batpad · 2025-02-07T07:05:52Z

I see that there is a fair bit of complexity being added to the ETL code to handle chunking and batching the loading of STAC records into the pg-stac db: https://github.com/IFRCGo/montandon-etl/blob/develop/apps/etl/load/sources/base.py#L37

I can see that this is required since we are using the POST endpoint on stac-api to add individual STAC records: https://github.com/IFRCGo/montandon-etl/blob/develop/apps/etl/load/sources/base.py#L11

I do think this is likely the best approach to go with, but just opening this ticket to see if there's better ways to bulk load records into STAC, saving us having to do this chunk management on the ETL side.

For bulk loading items into STAC, there are very efficient ways to do this if we talk to the stac database directly. I don't know if there's a good way to bulk load using the HTTP API.

It would be technically possible to connect directly the the pg-stac db from the ETL process and use the pgstac bulk load methods. But I can see that it seems nicer and cleaner to use the HTTP API to add records.

Just opening this ticket to evaluate our options - I think the current approach is also likely fine.

cc @emmanuelmathot @samshara @subinasr

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate options for bulk loading STAC records #136

Investigate options for bulk loading STAC records #136

batpad commented Feb 7, 2025

Investigate options for bulk loading STAC records #136

Investigate options for bulk loading STAC records #136

Comments

batpad commented Feb 7, 2025