-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Engineering with OpenEO - Use Case 1 #190
Comments
@jdries @juansensio @jamesemwheeler As a user, I want to make use of the EuroCrops dataset in EOTDL, create a filtered subset (EOTDL functionality) and use openEO from within EOTDL to generate predictive features from S1 and S2 time series, then train a model in EOTDL, and use run inference with that model in CDSE.
|
Define the list of features that we want to compute for this task. We can reuse the S1 and S2 pipelines from world cereal (features already validated). |
Below I share an example on how we typically access custom STAC collections: openeo-community-examples/python/LoadStac/load-stac-item-example.ipynb |
The example provided in: Feels like a more natural approach and a workflow we could provide as well. @juansensio could you clarify wheter you want openEO to acces the EuroCropsDataset or wheter we want to extract S1 and S2 data which match the spatio temporal bounds from the EuroCropsDataset? I believe openEO would be better suited to:
|
@juansensio @Patrick1G any feedback on how best to steer this use-case? |
Patrick knows more about the use case, but as far as I understand the EuroCrops dataset contain crop classes for parcel polygons, so the goal would be to pair it with additional variables derived from S1/S2 (for example yearly mean NDVI). openEO should be used to get this variables through a feature engineering pipleine, so we can use them to train a model and then re-use the pipeline at inference time. Here we can delegate the entire process to openEO, or rely on EOTDL to retrieve the geometries from the STAC catalog and pass them to openEO... I guess the second option is better since we do not need openEO to access the dataset in EOTDL directly (just pass the resulting STAC catalog with geometries). |
@HansVRP @juansensio the use case is described in detail above: - lets follow those steps please
Next steps then:
Not quite sure how step#2 above should be done?: Eurocrops contains millions of parcel polygons, and to train a model we only need a subset, e.g. contrained to a country, selected crop types and random selection of n polygons within that selection. --- I don't tink openEO provides good functionality to do this, so it could be done in EOTDL with python libraries. As a first step, this could also be done offline.. To be discussed at next meeting.. |
okay already have a first version up on https://github.com/earthpulse/eotdl/tree/hv_openeoexample Todo
@juansensio Does EOTDL has a dedicated cdse s3 storage which we can use to save the results into? |
For S2 I used Best Available Pixel composites, which create St monthly composites with a minimum amount of clouds. Afterwards I calculated some typical features (percentiles) For S1 I used a similar approach Please let me know your thoughts |
@HansVRP resources above are not accessible.. But its important to keep the EO science aspects in mind here: we need to generate feature/metrics at a high temporal interval, as this is the critical information for crop type prediction, so 5/7 or 10 day interval metrics, not monthly BAP composites. Therefore I would suggest to use a similar feature engineering approach as above in the S1metrics notebook: {min, mean, mx, stddev, Q25, Q50, Q75, Q90} and generate this for e.g. 10 day interval for the year of the Eurocrops dataset |
@Patrick1G @jdries please review the current version. Here I used weekly composites of which I calculate the P10, P25, P50, P75, P90 percentiles. The statistics can easily be expanded if required. However for now I kept them more limited as I run the statistics across |
Notebook updated at https://github.com/earthpulse/eotdl/blob/main/tutorials/usecases/openEO/use_case_1.ipynb Bloqued by error with openeo, @HansVRP can you provide some feedback ? Maybe is an issue with openeo versions? |
which version are you currently using? |
will take a look next week |
Note: GeoDB only stores the STAC metadata. For the kind of filtering proposed, we need the actual data (crop type), which is not in the STAC metadata. Hence, we will not be able to do this filtering directly with GeoDB nor with the STAC metadata (even locally), so Q1/Q2 will not be useful at all. Discuss in next progress meeting @Patrick1G |
@juansensio The reason you receive this error is because you have a preexisting database: job_tracker (the jobs.csv file). During development it is needed to remove the current jobs.csv file, or create a job tracker with a different name, prior to rerunning the cell. |
I'm btw logging an issue to create a clearer user facing warning on this |
I deleted the current jobs.csv and jobs.parquet but still have error @HansVRP |
This seems to be an issue with the operation being done/used in the start_job. I believe the geometries passed, are not in the proper geojson format, I'll take a look at the changes you've made and see whether I can have it run on your current input. |
feature engineering for parcels in eurocrops (temporal aggregation on some indices, for example)
The text was updated successfully, but these errors were encountered: