Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_cubes: The merge_cubes function doesn't merge the temporal range of the previous cube. #861

Closed
Pratichhya opened this issue Sep 10, 2024 · 8 comments · Fixed by #871
Assignees

Comments

@Pratichhya
Copy link

While creating a use case to combine an external STAC collection with the available collection, I encountered an issue where the preceding STAC collection wasn't merged. DEO-104

import openeo

connection = openeo.connect("openeo.dataspace.copernicus.eu").authenticate_oidc()

sentinel_extent = ["2017-01-01", "2020-12-30"]
landsat_extent = ["2014-01-01", "2016-12-31"]
spatial_extent = {"west": 5.0, "south": 51.21, "east": 5.05, "north": 51.25}

s2_cube = connection.load_collection(
    "SENTINEL2_L2A",
    temporal_extent=sentinel_extent,
    spatial_extent=spatial_extent,
    bands=["B04", "B08"]
)

landsat= "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2"
landsat_cube = connection.load_stac(
        landsat,
        spatial_extent=spatial_extent, 
        temporal_extent=landsat_extent, 
        bands=["OLI_B4", "OLI_B5"],
        properties={
        "platform": lambda x: x=="landsat-8"}
    )

s2_cube = s2_cube.ndvi()
landsat_cube = landsat_cube.ndvi(nir="OLI_B5", red="OLI_B4")

merged_cube = s2_cube.merge_cubes(landsat_cube)
merged_temporal= merged_cube.aggregate_temporal_period(period="season",reducer="mean")
timeseries_job = merged_temporal.execute_batch(title="landsats2_NDVI",outputfile="Merged_Landsat8Sen2.nc")
```

The received output has the following time dimension:

```
array(['2016-12-01T00:00:00.000000000', '2017-03-01T00:00:00.000000000',
       '2017-06-01T00:00:00.000000000', '2017-09-01T00:00:00.000000000',
       '2017-12-01T00:00:00.000000000', '2018-03-01T00:00:00.000000000',
       '2018-06-01T00:00:00.000000000', '2018-09-01T00:00:00.000000000',
       '2018-12-01T00:00:00.000000000', '2019-03-01T00:00:00.000000000',
       '2019-06-01T00:00:00.000000000', '2019-09-01T00:00:00.000000000',
       '2019-12-01T00:00:00.000000000', '2020-03-01T00:00:00.000000000',
       '2020-06-01T00:00:00.000000000', '2020-09-01T00:00:00.000000000',
       '2020-12-01T00:00:00.000000000'], dtype='datetime64[ns]')
```

I also tested the same example using load_collection for both collections, yet the issue exists.
@soxofaan
Copy link
Member

@jdries
Copy link
Contributor

jdries commented Sep 11, 2024

I could reproduce it on a much simpler graph. It looks like a critical issue that's not necessarily related to the issues mentioned above:

{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "NDVI"
        ],
        "id": "CGLS_NDVI_V3_GLOBAL",
        "spatial_extent": {
          "east": 51.64285714317316,
          "north": 12.214285714301129,
          "south": -1.8839285714099527,
          "west": 40.76785714315832
        },
        "temporal_extent": [
          "2020-05-01",
          "2020-06-30"
        ]
      }
    },
    "loadcollection2": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "NDVI"
        ],
        "id": "CGLS_NDVI300_V2_GLOBAL",
        "spatial_extent": {
          "east": 51.660714286030306,
          "north": 12.232142857158271,
          "south": -1.9017857142670955,
          "west": 40.75000000030117
        },
        "temporal_extent": [
          "2020-07-01",
          "2021-09-01"
        ]
      }
    },
    "merge1": {
      "process_id": "merge_cubes",
      "arguments": {
        "cube1": {
          "from_node": "loadcollection1"
        },
        "cube2": {
          "from_node": "loadcollection2"
        },
        "overlap_resolver": {
          "process_graph": {
            "max1": {
              "process_id": "max",
              "arguments": {
                "data": {
                  "from_parameter": "x"
                }
              },
              "result": true
            }
          }
        }
      }
    },
    "save2": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "merge1"
        },
        "format": "NETCDF"
      },
      "result": true
    }
  },
  "parameters": []
}

@jdries
Copy link
Contributor

jdries commented Sep 11, 2024

j-240911e9fb77427f99233b30551182f3
Merging cubes with spatial indices: SparseSpaceTimePartitioner 6 false - SparseSpaceTimePartitioner 6 false

@jdries
Copy link
Contributor

jdries commented Sep 11, 2024

The problem starts here:
https://github.com/Open-EO/openeo-geotrellis-extensions/blob/f33dc1252991b5630b65c3a88fbf5872b322765c/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/OpenEOProcesses.scala#L832

When the cube is resampled, it receives the partitioner of the other cube, which may not be usable.

@jdries jdries self-assigned this Sep 11, 2024
jdries added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue Sep 11, 2024
@jdries
Copy link
Contributor

jdries commented Sep 12, 2024

Second issue: after merge_cubes of 2 cubes with time dimension, the metadata should be updated to contain the union of the two input time ranges. If not done, aggregate_temporal_period will start from an incorrect input range, resulting in loss of data.

@bossie
Copy link
Collaborator

bossie commented Sep 12, 2024

Second issue: after merge_cubes of 2 cubes with time dimension, the metadata should be updated to contain the union of the two input time ranges. If not done, aggregate_temporal_period will start from an incorrect input range, resulting in loss of data.

FYI I'm currently implementing this in the context of #852.

@jdries jdries assigned bossie and unassigned jdries Sep 16, 2024
bossie added a commit that referenced this issue Sep 16, 2024
@bossie
Copy link
Collaborator

bossie commented Sep 16, 2024

Pushed a fix, needs a test.

bossie added a commit that referenced this issue Sep 16, 2024
* propagate temporal extent from load_stac

#852

* fix tests

#852

* add test

#852

* quick fix for new test crashing the JVM when running all tests

:vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::_List_iterator<
unsigned long> > >, std::_Select1st<std::pair<unsigned long const, std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char
, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::_List_iterator<unsigned long> > > >, std::less<unsigne
d long>, std::allocator<std::pair<unsigned long const, std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_
traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::_List_iterator<unsigned long> > > > >::find(unsigned long const&)+
0x11
/core.366)

#852

* reference fixed test data

#852

* debug KeyError: start_datetime

#852

* reference openeo_driver with fix

#852

* simplify filter_temporal

#852

* DRY: move GDALWarp.deinit() elsewhere

#852

* merge_cubes: merge temporal extents

#852
#861
bossie added a commit that referenced this issue Sep 17, 2024
@bossie bossie linked a pull request Sep 17, 2024 that will close this issue
@bossie bossie mentioned this issue Sep 17, 2024
@bossie
Copy link
Collaborator

bossie commented Sep 18, 2024

@Pratichhya this is available on staging. Let me know if you encounter any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants