Feature Engineering with OpenEO - Use Case 1 #190

earthpulse · 2024-06-10T10:31:58Z

feature engineering for parcels in eurocrops (temporal aggregation on some indices, for example)

openEO should get TDS from eotdl
user should define processing pipeline (with openEO directly or abstract in etodl?)
exectue process in openeo backend
ingest outputs in eotdl as a feature store

Patrick1G · 2024-10-09T07:34:25Z

@jdries @juansensio @jamesemwheeler
Here is a more detailed specification of this use case:

As a user, I want to make use of the EuroCrops dataset in EOTDL, create a filtered subset (EOTDL functionality) and use openEO from within EOTDL to generate predictive features from S1 and S2 time series, then train a model in EOTDL, and use run inference with that model in CDSE.

find and explore the EuroCropsDataset, stage it in the EODTL workspace
filter the EuroCropsDataset dataset using EOTDL functionality, to create a subset of parcels,
e.g., 8 crop classes, each with 1000 examples, for one country
run feature engineering with openEO, creating temporal metrics from a S1 and S2 time series (temporally optimised for crops classe of interest). Store feature engineering process graph with the training datsets in EOTDL
Use EOTDL functionality to train a model (for this the features need to be retrieved..). Store the model along with the openEO process graph in EOTDL.
Use the model to run inference (from within EOTDL?) in an openEO platform such as CDSE or openEO platform. Make use of the feature engineering process graph stored along with the EOTDL model.

juansensio · 2024-10-29T14:01:35Z

Define the list of features that we want to compute for this task.

We can reuse the S1 and S2 pipelines from world cereal (features already validated).

HansVRP · 2024-10-30T13:27:06Z

Below I share an example on how we typically access custom STAC collections:

openeo-community-examples/python/LoadStac/load-stac-item-example.ipynb

HansVRP · 2024-10-30T13:30:55Z

The example provided in:
https://github.com/earthpulse/eotdl/blob/main/tutorials/notebooks/forest-map.ipynb

Feels like a more natural approach and a workflow we could provide as well.

@juansensio could you clarify wheter you want openEO to acces the EuroCropsDataset or wheter we want to extract S1 and S2 data which match the spatio temporal bounds from the EuroCropsDataset?

I believe openEO would be better suited to:

select a region of interest
define a desired preprocessing methodology (save it as a process graph)
download the preprocessed data
Train the desired model on the data
combine the standardized preprocessing with the model to run inference\

HansVRP · 2024-11-05T20:11:41Z

@juansensio @Patrick1G any feedback on how best to steer this use-case?

juansensio · 2024-11-06T14:38:31Z

Patrick knows more about the use case, but as far as I understand the EuroCrops dataset contain crop classes for parcel polygons, so the goal would be to pair it with additional variables derived from S1/S2 (for example yearly mean NDVI).

openEO should be used to get this variables through a feature engineering pipleine, so we can use them to train a model and then re-use the pipeline at inference time.

Here we can delegate the entire process to openEO, or rely on EOTDL to retrieve the geometries from the STAC catalog and pass them to openEO... I guess the second option is better since we do not need openEO to access the dataset in EOTDL directly (just pass the resulting STAC catalog with geometries).

Patrick1G · 2024-11-08T08:05:39Z

@HansVRP @juansensio the use case is described in detail above: - lets follow those steps please

indeed the Eurocrops contains many parcel polygons for which we want to create the predictive features from S1 and S2 time series. So with openEO we want to generate the features for each polygon geometry (e.g. via aggregate spatial process)
features should be computed at dense temporal intervals (e.g. weekly or 10day - via aggregate temporal). There are two openEO notebooks that to similar feature engineering:
1. crop type mapping
1. S1-Stats

Next steps then:

training should happen in EOTDL
test inference run in CDSE
feature engineering process graph to be saved along with trained model in EOTDL - to be reused for inference

Not quite sure how step#2 above should be done?: Eurocrops contains millions of parcel polygons, and to train a model we only need a subset, e.g. contrained to a country, selected crop types and random selection of n polygons within that selection. --- I don't tink openEO provides good functionality to do this, so it could be done in EOTDL with python libraries. As a first step, this could also be done offline.. To be discussed at next meeting..

HansVRP · 2024-11-08T09:57:18Z

okay already have a first version up on https://github.com/earthpulse/eotdl/tree/hv_openeoexample

Todo

properly combine the geometries to reduce the total cost
Optimize the openeo settings for data extraction.

@juansensio Does EOTDL has a dedicated cdse s3 storage which we can use to save the results into?

HansVRP · 2024-11-08T15:37:46Z

@Patrick1G @jdries

For S2 I used Best Available Pixel composites, which create St monthly composites with a minimum amount of clouds. Afterwards I calculated some typical features (percentiles)
https://github.com/earthpulse/eotdl/blob/hv_openeoexample/tutorials/notebooks/openeo/generate_s3_UDP.py

For S1 I used a similar approach
https://github.com/earthpulse/eotdl/blob/hv_openeoexample/tutorials/notebooks/openeo/generate_s1_UDP.py

Please let me know your thoughts

Patrick1G · 2024-11-08T15:49:18Z

@HansVRP resources above are not accessible..

But its important to keep the EO science aspects in mind here: we need to generate feature/metrics at a high temporal interval, as this is the critical information for crop type prediction, so 5/7 or 10 day interval metrics, not monthly BAP composites. Therefore I would suggest to use a similar feature engineering approach as above in the S1metrics notebook: {min, mean, mx, stddev, Q25, Q50, Q75, Q90} and generate this for e.g. 10 day interval for the year of the Eurocrops dataset

HansVRP · 2024-11-19T12:38:54Z

@Patrick1G @jdries please review the current version.

Here I used weekly composites of which I calculate the P10, P25, P50, P75, P90 percentiles.

The statistics can easily be expanded if required. However for now I kept them more limited as I run the statistics across
10 S2 bands, and 2 S1 bands; thereby already resulting in a netCDF with 60 bands.

juansensio · 2025-01-24T11:14:36Z

Notebook updated at https://github.com/earthpulse/eotdl/blob/main/tutorials/usecases/openEO/use_case_1.ipynb

Bloqued by error with openeo, @HansVRP can you provide some feedback ?

Maybe is an issue with openeo versions?

HansVRP · 2025-01-24T12:03:24Z

which version are you currently using?

juansensio · 2025-01-24T12:28:26Z

The error showed uses 0.31.0

I upgraded to 0.37.0 and now see that the first parameter to run_jobs is optional, but still getting the following error:

HansVRP · 2025-01-24T12:55:59Z

will take a look next week

juansensio · 2025-01-24T13:03:45Z

Note: GeoDB only stores the STAC metadata. For the kind of filtering proposed, we need the actual data (crop type), which is not in the STAC metadata. Hence, we will not be able to do this filtering directly with GeoDB nor with the STAC metadata (even locally), so Q1/Q2 will not be useful at all. Discuss in next progress meeting @Patrick1G

HansVRP · 2025-01-27T14:55:33Z

@juansensio The reason you receive this error is because you have a preexisting database: job_tracker (the jobs.csv file).

During development it is needed to remove the current jobs.csv file, or create a job tracker with a different name, prior to rerunning the cell.

HansVRP · 2025-01-27T14:58:22Z

I'm btw logging an issue to create a clearer user facing warning on this

juansensio · 2025-01-28T08:42:57Z

I deleted the current jobs.csv and jobs.parquet but still have error @HansVRP

HansVRP · 2025-01-28T09:03:42Z

This seems to be an issue with the operation being done/used in the start_job.

I believe the geometries passed, are not in the proper geojson format, I'll take a look at the changes you've made and see whether I can have it run on your current input.

earthpulse assigned Schpidi, jamesemwheeler, Patrick1G, achtsnits, dmoglioni, juansensio and jdries Jun 10, 2024

juansensio added this to the v1.9 milestone Jul 18, 2024

juansensio modified the milestones: v1.9, v1.10 Oct 3, 2024

juansensio modified the milestones: v1.10, v1.11 Nov 7, 2024

This was referenced Dec 5, 2024

Feature Engineering with OpenEO - library integration #193

Closed

Feature Engineering with OpenEO - backend connection #192

Closed

Feature Engineering with OpenEO - Use Case 2 #191

Closed

juansensio modified the milestones: v1.11, v1.12 Dec 5, 2024

juansensio mentioned this issue Dec 5, 2024

feature engineering via openEO UDPs #142

Closed

HansVRP closed this as completed Dec 5, 2024

juansensio closed this as completed Jan 24, 2025

HansVRP reopened this Jan 24, 2025

HansVRP assigned HansVRP and unassigned Schpidi, jdries, jamesemwheeler, Patrick1G, achtsnits, dmoglioni and juansensio Jan 24, 2025

HansVRP closed this as completed Jan 27, 2025

HansVRP assigned juansensio and unassigned HansVRP Jan 27, 2025

juansensio reopened this Jan 28, 2025

HansVRP assigned HansVRP and unassigned juansensio Jan 28, 2025

juansensio mentioned this issue Jan 29, 2025

expose EuroCrops at granular level #122

Open

HansVRP closed this as completed Jan 30, 2025

juansensio reopened this Jan 30, 2025

juansensio modified the milestones: v1.13, v1.14 Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Engineering with OpenEO - Use Case 1 #190

Feature Engineering with OpenEO - Use Case 1 #190

earthpulse commented Jun 10, 2024

Patrick1G commented Oct 9, 2024 •

edited

Loading

juansensio commented Oct 29, 2024

HansVRP commented Oct 30, 2024 •

edited

Loading

HansVRP commented Oct 30, 2024 •

edited

Loading

HansVRP commented Nov 5, 2024

juansensio commented Nov 6, 2024

Patrick1G commented Nov 8, 2024 •

edited

Loading

HansVRP commented Nov 8, 2024 •

edited

Loading

HansVRP commented Nov 8, 2024

Patrick1G commented Nov 8, 2024

HansVRP commented Nov 19, 2024

juansensio commented Jan 24, 2025 •

edited

Loading

HansVRP commented Jan 24, 2025

juansensio commented Jan 24, 2025

HansVRP commented Jan 24, 2025

juansensio commented Jan 24, 2025

HansVRP commented Jan 27, 2025

HansVRP commented Jan 27, 2025

juansensio commented Jan 28, 2025

HansVRP commented Jan 28, 2025

Feature Engineering with OpenEO - Use Case 1 #190

Feature Engineering with OpenEO - Use Case 1 #190

Comments

earthpulse commented Jun 10, 2024

Patrick1G commented Oct 9, 2024 • edited Loading

juansensio commented Oct 29, 2024

HansVRP commented Oct 30, 2024 • edited Loading

HansVRP commented Oct 30, 2024 • edited Loading

HansVRP commented Nov 5, 2024

juansensio commented Nov 6, 2024

Patrick1G commented Nov 8, 2024 • edited Loading

HansVRP commented Nov 8, 2024 • edited Loading

HansVRP commented Nov 8, 2024

Patrick1G commented Nov 8, 2024

HansVRP commented Nov 19, 2024

juansensio commented Jan 24, 2025 • edited Loading

HansVRP commented Jan 24, 2025

juansensio commented Jan 24, 2025

HansVRP commented Jan 24, 2025

juansensio commented Jan 24, 2025

HansVRP commented Jan 27, 2025

HansVRP commented Jan 27, 2025

juansensio commented Jan 28, 2025

HansVRP commented Jan 28, 2025

Patrick1G commented Oct 9, 2024 •

edited

Loading

HansVRP commented Oct 30, 2024 •

edited

Loading

HansVRP commented Oct 30, 2024 •

edited

Loading

Patrick1G commented Nov 8, 2024 •

edited

Loading

HansVRP commented Nov 8, 2024 •

edited

Loading

juansensio commented Jan 24, 2025 •

edited

Loading