Skip to content

Commit

Permalink
Dynamic time selection (#2055)
Browse files Browse the repository at this point in the history
### Pull Request Checklist:
- [x] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #1987
- [x] Tests for the changes have been added (for bug fixes / features)
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [x] CHANGELOG.rst has been updated (with summary of main changes)
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?
The PR improves the `select_time` function providing the possibility to
select time based on `xr.DataArray` of days of year.
- Move `_get_doys` outside `select_time`
- Create `mask_between_doys` function
- Edit `select_time` to work with the new workflow to select time with
doys

### Other information:
I've run pytest but I had an error (ERROR
tests/test_sdba/test_adjustment.py::TestLoci::test_reduce_dims) and a
failed (FAILED tests/test_modules.py::test_encoding - AttributeError:
module '_locale' has no attribute 'nl_langinfo') but I don't know how to
fix this.
  • Loading branch information
aulemahal authored Feb 17, 2025
2 parents 84e9751 + a96ad8d commit 7cec9a4
Show file tree
Hide file tree
Showing 10 changed files with 293 additions and 77 deletions.
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,11 @@
"name": "Lehner, Sebastian",
"affiliation": "GeoSphere Austria, Vienna, Austria",
"orcid": "0000-0002-7562-8172"
},
{
"name": "Hamon, Baptiste",
"affiliation": "University of Canterbury, Christchurch, New Zealand",
"orcid": "0009-0007-4530-9772"
}
],
"keywords": [
Expand Down
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@ Contributors
* Adrien Lamarche `@LamAdr <https://github.com/LamAdr>`_
* Faisal Mahmood <[email protected]> <[email protected]> `@faimahsho <https://github.com/faimahsho>`_
* Sebastian Lehner <[email protected]> `@seblehner <https://github.com/seblehner>`_
* Baptiste Hamon <[email protected]> `@baptistehamon <https://github.com/baptistehamon>`_
8 changes: 5 additions & 3 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Changelog

v0.55.0 (unreleased)
--------------------
Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`), Trevor James Smith (:user:`Zeitsperre`), Sascha Hofmann (:user:`saschahofmann`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`).
Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`), Trevor James Smith (:user:`Zeitsperre`), Sascha Hofmann (:user:`saschahofmann`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`), Baptiste Hamon (:user:`baptistehamon`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand All @@ -29,16 +29,18 @@ New features and enhancements
* ``xclim.testing.helpers.test_timeseries`` now accepts a `calendar` argument that is forwarded to ``xr.cftime_range``. (:pull:`2019`).
* New ``xclim.indices.fao_allen98``, exporting the FAO-56 Penman-Monteith equation for potential evapotranspiration (:issue:`2004`, :pull:`2067`).
* Missing values method "pct" and "at_least_n" now accept a new "subfreq" option that allows to compute the missing mask in two steps. When given, the algorithm is applied at this "subfreq" resampling frequency first and then the result is resampled at the target indicator frequency. In the output, a period is invalid if any of its subgroup where flagged as invalid by the chosen method. (:pull:`2058`, :issue:`1820`).
* `rv_continuous` functions can now be given directly as the `dist` argument in ``standardized_precipitation_index`` and ``standardized_precipitation_evapotranspiration_index`` indicators. This includes `lmoments3` distribution when specifying `method="PWM"`. (:issue:`2043`, :pull:`2045`).
* ``rv_continuous`` functions can now be given directly as the ``dist`` argument in ``standardized_precipitation_index`` and ``standardized_precipitation_evapotranspiration_index`` indicators. This includes `lmoments3` distribution when specifying ``method="PWM"``. (:issue:`2043`, :pull:`2045`).
* Time selection in ``xclim.core.calendar.select_time`` and the ``**indexer`` argument of indicators now supports day-of-year bounds given as DataArrays with spatial and/or temporal dimensions. (:issue:`1987`, :pull:`2055`).

Internal changes
^^^^^^^^^^^^^^^^
* `sphinx-codeautolink` and `pygments` have been temporarily pinned due to breaking API changes. (:pull:`2030`).
* Adjusted the ``TestOfficialYaml`` test to use a dynamic method for finding the installed location of `xclim`. (:pull:`2028`).
* Adjusted two tests for better handling when running in Windows environments. (:pull:`2057`).
* Refactor of the ``xclim.core.missing`` module, usage of the ``Missing`` objects has been broken. (:pull:`2058`, :issue:`1820`, :issue:`2000`).
* Refactor of the ``xclim.core.missing`` module, usage of the ``Missing`` objects has been broken. (:pull:`2058`, :pull:`2055`, :pull:`2076`, :issue:`1820`, :issue:`2000`).
- Objects are initialized with their options and then called with the data, input frequency, target frequency and indexer.
- Subclasses receive non-resampled DataArray in their ``is_missing`` methods.
- Subclasses receive the array of valid timesteps ``valid`` instead of ``null``, the invalid ones.
- ``MissingWMO`` now uses ``xclim.indices.helpers.resample_map`` which should greatly improve performance in a dask context.
* There is now a warning stating that `fitkwargs` are not employed when using the `lmoments3` distribution. One exception is the use of `'floc'` which is allowed with the gamma distribution. `'floc'` is used to shift the distribution before computing fitting parameters with the `lmoments3` distribution since ``loc=0`` is always assumed in the library. (:issue:`2043`, :pull:`2045`).

Expand Down
5 changes: 3 additions & 2 deletions docs/notebooks/customize.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@
"\n",
"Finally, it is possible for advanced users to register their own methods. Xclim's missing methods are in fact class-based. To create a custom missing class, one should implement a subclass of `xclim.core.checks.MissingBase` and override at least the `is_missing` method. This method should take the following arguments:\n",
"\n",
"- `null`, a `DataArray` of the mask of invalid values in the input data array (with the same time coordinate as the raw data).\n",
"- `valid`, a `DataArray` of the mask of valid values in the input data array (with the same time coordinate as the raw data).\n",
"- `count`, `DataArray` of the number of days in each resampled periods\n",
"- `freq`, the resampling frequency.\n",
"\n",
Expand Down Expand Up @@ -185,8 +185,9 @@
" def __init__(self, max_n: int = 5):\n",
" super().__init__(max_n=max_n)\n",
"\n",
" def is_missing(self, null, count, freq):\n",
" def is_missing(self, valid, count, freq):\n",
" \"\"\"Return a boolean mask where True values are for elements that are considered missing and masked on the output.\"\"\"\n",
" null = ~valid\n",
" return (\n",
" null.resample(time=freq).map(longest_run, dim=\"time\")\n",
" >= self.options[\"max_n\"]\n",
Expand Down
157 changes: 140 additions & 17 deletions src/xclim/core/calendar.py
Original file line number Diff line number Diff line change
Expand Up @@ -1037,7 +1037,7 @@ def doy_to_days_since(
# 2cases:
# val is a day in the same year as its index : da - offset
# val is a day in the next year : da + doy_max - offset
out = xr.where(dac > base_doy, dac, dac + doy_max) - start_doy
out = xr.where(dac >= base_doy, dac, dac + doy_max) - start_doy
out.attrs.update(da.attrs)
if start is not None:
out.attrs.update(units=f"days after {start}")
Expand Down Expand Up @@ -1117,12 +1117,136 @@ def days_since_to_doy(
return out.convert_calendar(base_calendar).rename(da.name)


def _get_doys(start: int, end: int, inclusive: tuple[bool, bool]):
"""
Get the day of year list from start to end.
Parameters
----------
start : int
Start day of year.
end : int
End day of year.
inclusive : 2-tuple of booleans
Whether the bounds should be inclusive or not.
Returns
-------
np.ndarray
Array of day of year between the start and end.
"""
if start <= end:
doys = np.arange(start, end + 1)
else:
doys = np.concatenate((np.arange(start, 367), np.arange(0, end + 1)))
if not inclusive[0]:
doys = doys[1:]
if not inclusive[1]:
doys = doys[:-1]
return doys


def mask_between_doys(
da: xr.DataArray,
doy_bounds: tuple[int | xr.DataArray, int | xr.DataArray],
include_bounds: tuple[bool, bool] = [True, True],
) -> xr.DataArray | xr.Dataset:
"""
Mask the data outside the day of year bounds.
Parameters
----------
da : xr.DataArray or xr.Dataset
Input data. It must have a time coordinate.
doy_bounds : 2-tuple of integers or DataArray
The bounds as (start, end) of the period of interest expressed in day-of-year, integers going from
1 (January 1st) to 365 or 366 (December 31st).
If DataArrays are passed, they must have the same coordinates on the dimensions they share.
They may have a time dimension, in which case the masking is done independently for each period defined by the coordinate,
which means the time coordinate must have an inferable frequency (see :py:func:`xr.infer_freq`).
Timesteps of the input not appearing in the time coordinate of the bounds are masked as "outside the bounds".
Missing values (nan) in the start and end bounds default to 1 and 366 respectively in the non-temporal case
and to open bounds (the start and end of the period) in the temporal case.
include_bounds : 2-tuple of booleans
Whether the bounds of `doy_bounds` should be inclusive or not.
Returns
-------
xr.DataArray
Boolean array with the same time coordinate as `da` and any other dimension present on the bounds.
True value inside the period of interest and False outside.
"""
if isinstance(doy_bounds[0], int) and isinstance(doy_bounds[1], int): # Simple case
mask = da.time.dt.dayofyear.isin(_get_doys(*doy_bounds, include_bounds))
else:
start, end = doy_bounds
# convert ints to DataArrays
if isinstance(start, int):
start = xr.full_like(end, start)
elif isinstance(end, int):
end = xr.full_like(start, end)
# Ensure they both have the same dims
# align join='exact' will fail on common but different coords, broadcast will add missing coords
start, end = xr.broadcast(*xr.align(start, end, join="exact"))

if not include_bounds[0]:
start += 1
if not include_bounds[1]:
end -= 1

if "time" in start.dims:
freq = xr.infer_freq(start.time)
# Convert the doy bounds to a duration since the beginning of each period defined in the bound's time coordinate
# Also ensures the bounds share the sime time calendar as the input
# Any missing value is replaced with the min/max of possible values
calkws = dict(
calendar=da.time.dt.calendar, use_cftime=(da.time.dtype == "O")
)
start = doy_to_days_since(start.convert_calendar(**calkws)).fillna(0)
end = doy_to_days_since(end.convert_calendar(**calkws)).fillna(366)

out = []
# For each period, mask the days since between start and end
for base_time, indexes in da.resample(time=freq).groups.items():
group = da.isel(time=indexes)

if base_time in start.time:
start_d = start.sel(time=base_time)
end_d = end.sel(time=base_time)

# select days between start and end for group
days = (group.time - base_time).dt.days
days = days.where(days >= 0)
mask = (days >= start_d) & (days <= end_d)
else: # This group has no defined bounds : put False in the mask
# Array with the same shape as the "mask" in the other case : broadcast of time and bounds dims
template = xr.broadcast(
group.time.dt.day, start.isel(time=0, drop=True)
)[0]
mask = xr.full_like(template, False, dtype="bool")
out.append(mask)
mask = xr.concat(out, dim="time")
else: # Only "Spatial" dims, we can't constrain as in days since, so there are two cases
doys = da.time.dt.dayofyear # for readability
# Any missing value is replaced with the min/max of possible values
start = start.fillna(1)
end = end.fillna(366)
mask = xr.where(
start <= end,
# case 1 : start <= end, ROI is within a calendar year
(doys >= start) & (doys <= end),
# case 2 : start > end, ROI crosses the new year
~((doys > end) & (doys < start)),
)
return mask


def select_time(
da: xr.DataArray | xr.Dataset,
drop: bool = False,
season: str | Sequence[str] | None = None,
month: int | Sequence[int] | None = None,
doy_bounds: tuple[int, int] | None = None,
doy_bounds: tuple[int | xr.DataArray, int | xr.DataArray] | None = None,
date_bounds: tuple[str, str] | None = None,
include_bounds: bool | tuple[bool, bool] = True,
) -> DataType:
Expand All @@ -1138,14 +1262,16 @@ def select_time(
da : xr.DataArray or xr.Dataset
Input data.
drop : bool
Whether to drop elements outside the period of interest or to simply mask them (default).
Whether to drop elements outside the period of interest (True) or to simply mask them (False, default).
This option is incompatible with passing array-like doy_bounds.
season : str or sequence of str, optional
One or more of 'DJF', 'MAM', 'JJA' and 'SON'.
month : int or sequence of int, optional
Sequence of month numbers (January = 1 ... December = 12).
doy_bounds : 2-tuple of int, optional
doy_bounds : 2-tuple of int or xr.DataArray, optional
The bounds as (start, end) of the period of interest expressed in day-of-year, integers going from
1 (January 1st) to 365 or 366 (December 31st).
1 (January 1st) to 365 or 366 (December 31st). If a combination of int and xr.DataArray is given,
the int day-of-year corresponds to the year of the xr.DataArray.
If calendar awareness is needed, consider using ``date_bounds`` instead.
date_bounds : 2-tuple of str, optional
The bounds as (start, end) of the period of interest expressed as dates in the month-day (%m-%d) format.
Expand Down Expand Up @@ -1187,17 +1313,6 @@ def select_time(
if N == 0:
return da

def _get_doys(_start, _end, _inclusive):
if _start <= _end:
_doys = np.arange(_start, _end + 1)
else:
_doys = np.concatenate((np.arange(_start, 367), np.arange(0, _end + 1)))
if not _inclusive[0]:
_doys = _doys[1:]
if not _inclusive[1]:
_doys = _doys[:-1]
return _doys

if isinstance(include_bounds, bool):
include_bounds = (include_bounds, include_bounds)

Expand All @@ -1212,7 +1327,15 @@ def _get_doys(_start, _end, _inclusive):
mask = da.time.dt.month.isin(month)

elif doy_bounds is not None:
mask = da.time.dt.dayofyear.isin(_get_doys(*doy_bounds, include_bounds))
if (
not (isinstance(doy_bounds[0], int) and isinstance(doy_bounds[1], int))
and drop
):
# At least one of those is an array, this drop won't work
raise ValueError(
"Passing array-like doy bounds is incompatible with drop=True."
)
mask = mask_between_doys(da, doy_bounds, include_bounds)

elif date_bounds is not None:
# This one is a bit trickier.
Expand Down
Loading

0 comments on commit 7cec9a4

Please sign in to comment.