Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_collection and load_stac: Clarify the dimension names and labels #491

Open
wants to merge 3 commits into
base: draft
Choose a base branch
from

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Jan 3, 2024

Fixes #488 and #489

@m-mohr m-mohr added this to the 2.0.0 milestone Jan 3, 2024
@m-mohr m-mohr requested a review from soxofaan January 3, 2024 14:31
@m-mohr m-mohr marked this pull request as ready for review January 3, 2024 14:31
@m-mohr m-mohr linked an issue Jan 3, 2024 that may be closed by this pull request
@m-mohr m-mohr linked an issue Jan 3, 2024 that may be closed by this pull request
@m-mohr m-mohr changed the title load_collection and load_stac: Clarify the order of dimension labels for nominal labels. #488 load_collection and load_stac: Clarify the dimension names and labels Jan 3, 2024
@m-mohr m-mohr requested a review from clausmichele January 3, 2024 14:38
load_collection.json Outdated Show resolved Hide resolved
@@ -1,7 +1,7 @@
{
"id": "load_stac",
"summary": "Loads data from STAC",
"description": "Loads data from a static STAC catalog or a STAC API Collection and returns the data as a processable data cube. A batch job result can be loaded by providing a reference to it.\n\nIf supported by the underlying metadata and file format, the data that is added to the data cube can be restricted with the parameters `spatial_extent`, `temporal_extent` and `bands`. If no data is available for the given extents, a `NoDataAvailable` exception is thrown.\n\n**Remarks:**\n\n* The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as specified in the metadata if the `bands` parameter is set to `null`.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.",
"description": "Loads data from a static STAC catalog or a STAC API Collection and returns the data as a processable data cube. A batch job result can be loaded by providing a reference to it.\n\nIf supported by the underlying metadata and file format, the data that is added to the data cube can be restricted with the parameters `spatial_extent`, `temporal_extent` and `bands`. If no data is available for the given extents, a `NoDataAvailable` exception is thrown.\n\n**Remarks:**\n\n* The dimensions (e.g. names) follow the data cube metadata (`cube:dimensions`), if present. Otherwise, it tries to preserve any dimension names available in the files. Otherwise, it falls back to the recommended dimension names as specified in the openEO API (`x`, `y`, `z`, `t`, `bands`, `geometry`).\n* All dimensions that specify nominal dimension labels (e.g. bands) are expected to be ordered as specified in the data cube metadata (`cube:dimensions`) unless otherwise specified in a corresponding parameter (e.g. `bands`). If no data cube metadata is present, the order of the dimension labels will reflect the structure in the files.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no data cube metadata is present, the order of the dimension labels will reflect the structure in the files.

Isn't it safer to say that the order will be undefined?
Otherwise users might be tricked into depending on implementation details that can suddenly change. Also, "structure in the files" might be not as straightforward practically as it sounds (e.g. multiple files where band order is different).

If a predictable order is important, we might also prescribe to use alphabetical order if no other reliable band order source is available.

Copy link
Member Author

@m-mohr m-mohr Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us, maybe. For the user, undefined is the worst case scenario, I think.

load_stac is anyway implementation heavily implementation detail dependant, the order of the bands is just a tiny detail of it. If you change band order in the source files, you may change a lot more (think of recent S2 changes with the offsets etc). So I don't see an issue here (at least not only for bands).

Alphabetical order as in B1, B10, A2 or B1, B2, B10? ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the user, undefined is the worst case scenario, I think.

I'm not sure. I'd rather prefer an simple honest statement that the order is undefined than getting the impression that the order is static (unless you understand the fine print with a lot of technical jargon).

But anyway, I guess the main message should be: dear user, if you care about the band order, specify it explicitly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if alphabetical is an option I would keep it simple to pure alphabetical: B1, B10, B2. The main goal is to be predictable, not trying to guess what makes most sense for humans.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree with Stefaan. And having an explicit statement telling that for some edge cases the order can't be defined automatically and the best practice consist in defining explicitly the bands to load and their order.

@m-mohr m-mohr requested a review from soxofaan January 4, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load_stac: dimension names load_stac: band order remark
3 participants