Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#237 aggregate_spatial needs to translate incoming geometries to same CRS as the data #238

Conversation

danielFlemstrom
Copy link

If the incoming geometries have a different crs, we now "translate" this to the same CRS as the data (since the xvec function assumes same crs without checking it).

I have added the line that i think should fix this. Please check that its correct and does not have any side-effects that I was unaware of.

@ValentinaHutter
Copy link
Collaborator

Thanks for adding this! Would be great if the change could also be test - in this case, tests live here: https://github.com/Open-EO/openeo-processes-dask/blob/main/tests/test_aggregate.py#L100

I think this might also be a nice addition to the xvec code - @masawdah does it make sense to you, to add this in xvec directly and update the definition there?

@@ -146,8 +146,9 @@ def aggregate_spatial(
gdf = gpd.GeoDataFrame(geometry=[polygon])
gdf.crs = DEFAULT_CRS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif isinstance(geometries, gpd.GeoDataFrame):
gdf = geometries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to allow the geometries to be a GeoDataFrame for the test to work...

@ValentinaHutter
Copy link
Collaborator

A simple test before the geometry_url test (https://github.com/Open-EO/openeo-processes-dask/blob/main/tests/test_aggregate.py#L150) could look like this:

    gdf = gpd.GeoDataFrame.from_features(polygon_geometry_small, crs="EPSG:4326")
    gdf_equi7 = gdf.to_crs(
        "+proj=aeqd +lat_0=53 +lon_0=24 +x_0=5837287.81977 +y_0=2121415.69617 +datum=WGS84 +units=m +no_defs"
    )
    output_cube_transform = aggregate_spatial(
        data=reduced_cube, geometries=gdf_equi7, reducer=reducer
    )
    assert len(output_cube_transform.dims) == len(output_cube.dims)
    assert output_cube_transform.shape == output_cube.shape

@ValentinaHutter
Copy link
Collaborator

I opened a PR on your fork to merge the latest updates of openeo-processes-dask into your branch.

@clausmichele
Copy link
Member

@ValentinaHutter trying to create an example using aggregate_spatial I also faced a related bug.
pydantic checks the CRS here: https://github.com/Open-EO/openeo-pg-parser-networkx/blob/50e71097929086beca7377ac6d909831a2ecb87c/openeo_pg_parser_networkx/pg_schema.py#L112

which is not compatible of what we get with a geoJSON string, which is a dictionary.

Here is a code to reproduce the issue:

import json
import geopandas as gpd
from openeo.local import LocalConnection
local_conn = LocalConnection("./")

url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
spatial_extent =  {"east": 11.40, "north": 46.52, "south": 46.46, "west": 11.25}
temporal_extent = ["2022-06-01", "2022-06-30"]
bands = ["red", "nir"]
properties = {"eo:cloud_cover": dict(lt=80)}
s2_datacube = local_conn.load_stac(
    url=url,
    spatial_extent=spatial_extent,
    temporal_extent=temporal_extent,
    bands=bands,
    properties=properties,
)

b04 = s2_datacube.band("red")
b08 = s2_datacube.band("nir")
ndvi = (b08 - b04) / (b08 + b04)

sample_geojson_string = """
{"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.348766682377743,46.48077235403869],[11.348766682377743,46.47721029746353],[11.35563673107842,46.47721029746353],[11.35563673107842,46.48077235403869],[11.348766682377743,46.48077235403869]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.365497271567875,46.48683844486996],[11.362870488240105,46.48441765486314],[11.365658919772045,46.48208023815508],[11.3688514718161,46.48305417398808],[11.367477462075641,46.485725451348884],[11.365497271567875,46.48683844486996]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.372809654880939,46.486050715872665],[11.371383328810936,46.484121615720795],[11.375203845069393,46.482332752634534],[11.37754709654618,46.48349025889712],[11.3788206018977,46.48548953006227],[11.375407607374086,46.486226085126646],[11.372809654880939,46.486050715872665]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.358546395621374,46.47956165241811],[11.358240754320718,46.478158509926004],[11.363283835883294,46.47563276206486],[11.365219564109765,46.47864961374469],[11.362723493271574,46.481245375637826],[11.358546395621374,46.47956165241811]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.35324861290701,46.48243798140945],[11.359004857403477,46.48170137505551],[11.360278362822669,46.483244729307046],[11.359157678053293,46.48527908392737],[11.355082460711827,46.48510371152537],[11.35324861290701,46.48243798140945]]],"type":"Polygon"}}]}
"""


sample_geojson = json.loads(sample_geojson_string)
#The data is in EPSG:32632, we need to convert the geoJSON in EPSG:4326 to that
geoms = gpd.GeoDataFrame.from_features(sample_geojson).set_crs("EPSG:4326").to_crs("EPSG:32632")

# Doesn't work due to Pydantic (AttributeError: 'dict' object has no attribute 'strip')
geojson_32632_dict = json.loads(geoms.to_json())
ndvi_stats = ndvi.aggregate_spatial(geometries = geojson_32632_dict, reducer = "median").execute()

# Doesn't work due to openeo-processes-dask, which expects a dict
geojson_32632_dict["crs"] = 32632
ndvi_stats = ndvi.aggregate_spatial(geometries = geojson_32632_dict, reducer = "median").execute()

@danielFlemstrom
Copy link
Author

Hmm, sadly i discovered this : "The support for specifying Coordinate Reference Systems (CRS) other than WGS 84 was removed from the GeoJSON specification in its 2016 version, RFC 7946, which mandates that all coordinates are assumed to be in the WGS 84 system. For applications needing a different CRS, transformations must occur outside the GeoJSON format, as it exclusively assumes WGS 84 coordinates."

As i understand it, this meanst that we cannot specify (other) CRS:s for polygons, only for the bounding box ("west", ....). However, since the rastercube still may be a of different CRS, i guess we still need to convert the polygon to the CRS of the rastercube (which should be possible once in geopandas).

@clausmichele
Copy link
Member

Hmm, sadly i discovered this : "The support for specifying Coordinate Reference Systems (CRS) other than WGS 84 was removed from the GeoJSON specification in its 2016 version, RFC 7946, which mandates that all coordinates are assumed to be in the WGS 84 system. For applications needing a different CRS, transformations must occur outside the GeoJSON format, as it exclusively assumes WGS 84 coordinates."

As i understand it, this meanst that we cannot specify (other) CRS:s for polygons, only for the bounding box ("west", ....). However, since the rastercube still may be a of different CRS, i guess we still need to convert the polygon to the CRS of the rastercube (which should be possible once in geopandas).

Exactly. geoJSON doesn't support other CRSs than EPSG:4326. The action points here are:

  1. Convert the CRS to the native of the datacube, like we did in the filter_spatial implementation here: https://github.com/Open-EO/openeo-processes-dask/blob/main/openeo_processes_dask/process_implementations/cubes/_filter.py#L122-L128
  2. Implement a process to load vector cubes from other files (like GeoParquet as mention by @soxofaan)

@soxofaan
Copy link
Member

soxofaan commented Apr 4, 2024

  1. Implement a process to load vector cubes from other files (like GeoParquet as mention by @soxofaan)

FYI: the current approach we use is load_url with a geoparquet file (which supports non-EPSG4326 CRSes).
Drawback of that approach is that it requires you to host your geometry on a publicly available URL, which is more cumbersome than just having inline GeoJSON.
As a workaround I propose in Open-EO/openeo-processes#498 to also support data URLs in load_url, allowing to embed binary data directly in process graphs, based on a well-established standard. Feel free to weigh in on the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants