Rebuild stac scripts #158

GriffinBabe · 2024-09-16T07:40:26Z

Implemented scripts to build the STAC catalogue from the extracted patches. The procedure is done in three steps

scripts/stac/build_paths.py Parses a folder of S1/S2 patches (for example /vitodata/worldcereal_data/EXTRACTIONS/SENTINEL2/) and creates a file of a pickled list of paths to parallelize & parse in the next script.
scripts/stac/cataloguer_builder.py Using the Spark cluster to read the extracted patches and build catalogues, then each separate catalogue is pickled in a different output file in a specified output folder.
scripts/stac/split_catalogues.py Merges all the catalogues built by the previous script together and separates them by UTM as required by OpenEO. Separated by GFMAP's functionality openeo_gfmap.utils.split_stac.split_collection_by_epsg. The output folder will contain each STAC fully written as json.

In addition, I moved constants previously defined in the extraction scripts such as Asset Definitions in worldcereal's source folder. As from know on multiple scripts will use those values.

Finally, I disabled the creation/management of STAC assets from GFMAPJobManager in the extraction scripts.

To ease up the execution of the catalogue_builder.py script, I used the package mepsy developed by Daniele which allows to run Spark Jobs in our cluster with less hassle. I didn't add this dependency in the package as it is not publicly installable but instead accessible in our private git server: https://git.vito.be/projects/TAP-VEGTEAM/repos/mepsy/browse

We could make this more broadly available by not using mepsy, but then not many have a spark cluster for themselves and each one works on different configuration. Let me know what you think...

…tiles + Moved the asset definitions to worldcereal's source folder as they are accessef from multiple points

GriffinBabe added 2 commits September 16, 2024 09:25

Fixed/debugged scripts + Added scripts to split the catalogue in UTM …

a4625db

…tiles + Moved the asset definitions to worldcereal's source folder as they are accessef from multiple points

Moved catalogue building script files to a different folder /stac/

affb58a

GriffinBabe requested a review from kvantricht September 16, 2024 07:40

GriffinBabe self-assigned this Sep 16, 2024

GriffinBabe linked an issue Sep 16, 2024 that may be closed by this pull request

Develop script to re-build STAC catalogue #151

Closed

1 task

GriffinBabe added 2 commits September 16, 2024 09:45

Replaced update_nc_attributes

d1b7e1a

Fix isort

c185c00

GriffinBabe requested a review from VincentVerelst September 16, 2024 07:50

GriffinBabe linked an issue Sep 16, 2024 that may be closed by this pull request

Replace the update_nc_attributes #156

Closed

GriffinBabe added 2 commits September 16, 2024 12:26

Made catalogue builder script optionally mepsy independant

f98db3e

Fix black

c449bd9

VincentVerelst approved these changes Sep 17, 2024

View reviewed changes

GriffinBabe merged commit ec7fd58 into main Sep 18, 2024
4 checks passed

GriffinBabe deleted the 151-rebuild-stac branch September 18, 2024 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebuild stac scripts #158

Rebuild stac scripts #158

GriffinBabe commented Sep 16, 2024

Rebuild stac scripts #158

Rebuild stac scripts #158

Conversation

GriffinBabe commented Sep 16, 2024