Filter data by observation time upon loading #2974

adybbroe · 2024-11-08T15:42:56Z

Feature Request

Is your feature request related to a problem? Please describe.

So, this issue came up when working with the early Arctic Weather Satellite (AWS1) data. AWS1 was launched Aug 16, 2024. Data are not yet released, but I got a few orbits for the EUM conf 2024. But the issue applies to all data and in particular to all global polar orbiting satellites - for instance the NOAA GAC AVHRR data, where files may span an entire (or a bit more) orbit.

Suppose you want to look at the area where the orbit starts and ends, like in the case of the image below:

The start of this orbit is 19:12 UTC, but the end is ~95 minutes later. So if you want a snapshot of the data over an area of interest you do not want to blend the start and the end of the orbit, either you want just the beginning, or just then end!

So, simply reading all data and remap to the area, as in the below examples doesn't really work:

Describe the solution you'd like

So, what I would have liked in this case, is an option to read only data applicable to a certain time window, so for example:

scn = Scene(filenames=FILENAMES, reader='aws_l1b_nc')
composites = ['mw183_humidity']
scn.load(composites, start_time=datetime(2024, 9, 13, 19, 12), end_time=datetime(2024, 9, 13, 19, 22))

Describe any changes to existing user workflow
Don't believe there should be any changes needed.

Additional context
I know that we have usually the opportunity to do this filtering by choosing only the (granule) files that are relevant for what we want to do. That is fine when the data are segmented in a few minutes of observation data, but doesn't work when data comes as one file with an entire orbit.

The text was updated successfully, but these errors were encountered:

djhoese · 2024-11-11T16:33:26Z

Theoretically this should be possible as part of xarray's "coords" interface. That is, if you know the times you want and the reader adds a time coordinate to data_arr.coords["time"] or something similar, then you should be able to use xarray's slicing/isel/loc or whatever it is called to slice out the portion you want based on that coordinate variable. Some complications:

Most readers don't add this time coordinate.
I don't think we've established a consistent naming for the variable regardless.
Because we store geolocation information in a non-xarray-standard .attrs["area"], you'd have to get the slicing information (somehow) for the data and then apply it to the SwathDefinition separately. So slice the data DataArray, then slice the SwathDefinition, and finally re-assign that swath definition to the new DataArray's .attrs["area"]. I don't think the Satpy Scene currently allows high-level access to .loc for the all DataArrays in the Scene as we have multiple resolutions in one Scene container and doing that type of indexing would be difficult.

Xarray indexing and selecting:

https://docs.xarray.dev/en/stable/user-guide/indexing.html

adybbroe self-assigned this Nov 8, 2024

adybbroe added enhancement code enhancements, features, improvements component:readers labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter data by observation time upon loading #2974

Filter data by observation time upon loading #2974

adybbroe commented Nov 8, 2024 •

edited

Loading

djhoese commented Nov 11, 2024

Filter data by observation time upon loading #2974

Filter data by observation time upon loading #2974

Comments

adybbroe commented Nov 8, 2024 • edited Loading

Feature Request

djhoese commented Nov 11, 2024

adybbroe commented Nov 8, 2024 •

edited

Loading