Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform preprocessing to geoparquet from patch extractions #101

Open
4 tasks done
kvantricht opened this issue Jul 8, 2024 · 9 comments
Open
4 tasks done

Perform preprocessing to geoparquet from patch extractions #101

kvantricht opened this issue Jul 8, 2024 · 9 comments
Assignees
Milestone

Comments

@kvantricht
Copy link
Contributor

kvantricht commented Jul 8, 2024

Starting from patch extractions, we need a workflow that performs:

  • loading extractions as openEO datacubes
  • perform preprocessing
  • perform pixel sampling
  • export result as geoparquet
@kvantricht kvantricht added this to the System V2 milestone Jul 8, 2024
@VincentVerelst
Copy link
Contributor

Currently it's not possible on the CDSE backend to load_stac multiple CRS: Open-EO/openeo-geopyspark-driver#827

@VincentVerelst
Copy link
Contributor

Implemented a STAC splitter in gfmap to circumvent the load_stac issue: Open-EO/openeo-gfmap#146

@kvantricht
Copy link
Contributor Author

Needs #125 first.

@jdries
Copy link
Contributor

jdries commented Sep 4, 2024

moving to public mount is possible, but this would trigger a huge update in stac metadata

@VincentVerelst
Copy link
Contributor

  • Successfully loaded precomposited METEO from STAC on Terrascope; this gives a huge performance boost
  • Was able to generate a parquet file with preprocessed data of meteo, dem, s1 and s2 timeseries, using some random points
  • /vitodata/worldcereal_data should be available on openeo-dev.vito.be soon (waiting on integration tests). This would mean we can load METEO and the Sentinel 1 and 2 patches from STAC from there
  • For interaction with the RDM API we are still dependent on two issues, see: Setup ground truth geoparquet generation pipeline using RDM API #125

@kvantricht
Copy link
Contributor Author

That's great progress Vincent, thanks! So once the RDM interaction is there we have in principle everything in place to start gradually building this geoparquet and performing some downstream tests on them?

@VincentVerelst
Copy link
Contributor

That's great progress Vincent, thanks! So once the RDM interaction is there we have in principle everything in place to start gradually building this geoparquet and performing some downstream tests on them?

Indeed!

@kvantricht
Copy link
Contributor Author

Conceptual pipeline:

  • per ref_id per EPSG zone, get temporal_extent from STAC metadata filtered on the ref_id specific patches
  • RDM interaction module gets geometries from RDM based on BBOX of all patches in the job and the temporal extent
  • To be implemented: some way to distinguish original seed features vs collateral for downstream tracking (@cbutsko )
  • Resulting geodataframe is written to geoparquet, containing all required attributes, to temporary folder on artifactory
  • This geodataframe immediately is used (load_url) to launch patch-to-point script with same AOI and temporal extent
  • Final result is downloaded as separate geoparquet
  • To be implemented: some post_job_action that allows to merge this result safely into the global geoparquet

@VincentVerelst
Copy link
Contributor

Blocked by gdalerrors when using load_stac to load in patch extractions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants