`load_collection` and `load_stac`: Clarify the dimension names and labels #491

m-mohr · 2024-01-03T14:31:23Z

…els for nominal labels. #488

load_collection.json

soxofaan · 2024-01-04T16:08:11Z

proposals/load_stac.json

@@ -1,7 +1,7 @@
 {
    "id": "load_stac",
    "summary": "Loads data from STAC",
-    "description": "Loads data from a static STAC catalog or a STAC API Collection and returns the data as a processable data cube. A batch job result can be loaded by providing a reference to it.\n\nIf supported by the underlying metadata and file format, the data that is added to the data cube can be restricted with the parameters `spatial_extent`, `temporal_extent` and `bands`. If no data is available for the given extents, a `NoDataAvailable` exception is thrown.\n\n**Remarks:**\n\n* The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as specified in the metadata if the `bands` parameter is set to `null`.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.",
+    "description": "Loads data from a static STAC catalog or a STAC API Collection and returns the data as a processable data cube. A batch job result can be loaded by providing a reference to it.\n\nIf supported by the underlying metadata and file format, the data that is added to the data cube can be restricted with the parameters `spatial_extent`, `temporal_extent` and `bands`. If no data is available for the given extents, a `NoDataAvailable` exception is thrown.\n\n**Remarks:**\n\n* The dimensions (e.g. names) follow the data cube metadata (`cube:dimensions`), if present. Otherwise, it tries to preserve any dimension names available in the files. Otherwise, it falls back to the recommended dimension names as specified in the openEO API (`x`, `y`, `z`, `t`, `bands`, `geometry`).\n* All dimensions that specify nominal dimension labels (e.g. bands) are expected to be ordered as specified in the data cube metadata (`cube:dimensions`) unless otherwise specified in a corresponding parameter (e.g. `bands`). If no data cube metadata is present, the order of the dimension labels will reflect the structure in the files.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.",


If no data cube metadata is present, the order of the dimension labels will reflect the structure in the files.

Isn't it safer to say that the order will be undefined?
Otherwise users might be tricked into depending on implementation details that can suddenly change. Also, "structure in the files" might be not as straightforward practically as it sounds (e.g. multiple files where band order is different).

If a predictable order is important, we might also prescribe to use alphabetical order if no other reliable band order source is available.

For us, maybe. For the user, undefined is the worst case scenario, I think.

load_stac is anyway implementation heavily implementation detail dependant, the order of the bands is just a tiny detail of it. If you change band order in the source files, you may change a lot more (think of recent S2 changes with the offsets etc). So I don't see an issue here (at least not only for bands).

Alphabetical order as in B1, B10, A2 or B1, B2, B10? ;-)

For the user, undefined is the worst case scenario, I think.

I'm not sure. I'd rather prefer an simple honest statement that the order is undefined than getting the impression that the order is static (unless you understand the fine print with a lot of technical jargon).

But anyway, I guess the main message should be: dear user, if you care about the band order, specify it explicitly

and if alphabetical is an option I would keep it simple to pure alphabetical: B1, B10, B2. The main goal is to be predictable, not trying to guess what makes most sense for humans.

I fully agree with Stefaan. And having an explicit statement telling that for some edge cases the order can't be defined automatically and the best practice consist in defining explicitly the bands to load and their order.

load_collection and load_stac: Clarify the order of dimension lab…

e31192d

…els for nominal labels. #488

m-mohr added this to the 2.0.0 milestone Jan 3, 2024

m-mohr requested a review from soxofaan January 3, 2024 14:31

m-mohr marked this pull request as ready for review January 3, 2024 14:31

m-mohr linked an issue Jan 3, 2024 that may be closed by this pull request

load_stac: band order remark #488

Open

load_collection and load_stac: Clarify the dimension naming. #489

c03ee1e

m-mohr linked an issue Jan 3, 2024 that may be closed by this pull request

load_stac: dimension names #489

Open

m-mohr changed the title ~~load_collection and load_stac: Clarify the order of dimension labels for nominal labels. #488~~ load_collection and load_stac: Clarify the dimension names and labels Jan 3, 2024

m-mohr requested a review from clausmichele January 3, 2024 14:38

soxofaan reviewed Jan 4, 2024

View reviewed changes

soxofaan mentioned this pull request Jan 4, 2024

load_stac: band order remark #488

Open

Remove "expected to be"

4460104

m-mohr requested a review from soxofaan January 4, 2024 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`load_collection` and `load_stac`: Clarify the dimension names and labels #491

`load_collection` and `load_stac`: Clarify the dimension names and labels #491

m-mohr commented Jan 3, 2024 •

edited

Loading

soxofaan Jan 4, 2024

m-mohr Jan 4, 2024 •

edited

Loading

soxofaan Jan 9, 2024

soxofaan Jan 9, 2024

clausmichele Jan 10, 2024

load_collection and load_stac: Clarify the dimension names and labels #491

Are you sure you want to change the base?

load_collection and load_stac: Clarify the dimension names and labels #491

Conversation

m-mohr commented Jan 3, 2024 • edited Loading

soxofaan Jan 4, 2024

Choose a reason for hiding this comment

m-mohr Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

soxofaan Jan 9, 2024

Choose a reason for hiding this comment

soxofaan Jan 9, 2024

Choose a reason for hiding this comment

clausmichele Jan 10, 2024

Choose a reason for hiding this comment

`load_collection` and `load_stac`: Clarify the dimension names and labels #491

`load_collection` and `load_stac`: Clarify the dimension names and labels #491

m-mohr commented Jan 3, 2024 •

edited

Loading

m-mohr Jan 4, 2024 •

edited

Loading