Adding statistics to Raster #638

vschaffn · 2024-12-03T17:46:13Z

Description:

This pull request introduces enhancements to the statistics calculation methods in the project. The goal is to provide users with more flexible and comprehensive statistics functionality for raster data. The get_stats method has been created to:

Return a list of statistics in a dict containing mean, median, max, mean, sum, sum of squares, 90th percentile, nmad, rmse, std.
Support single statistic: User can request one statistic.
Support multiple statistics: Users can request multiple statistics in one call.
Support custom callables: Users can pass any callable function to calculate custom statistics.

A list of aliases has been drawn up in order to be as flexible as possible when users request statistics.

Changes:

NMAD in geoutils.stats
note : nmad can't be called directly from a raster because of circular imports, you have to go through an array like.
_statistics method: Calculates common statistics (mean, median, max, min, sum, RMSE, etc.) for a specified band of raster data.
get_stats method: Retrieves requested statistics by their name (either from a predefined list or user-defined), with support for passing a callable function.
_get_single_stat method: Handles fetching a single statistic by matching it with its alias or function name. Prints a warning and return NaN if the statistic required is not in the list
Support for callable statistics: Users can now provide any custom function that reduces the raster data (such as custom percentiles).

Usage Examples:

Get all statistics:
```
stats = raster.get_stats()
```
Get a single statistic (e.g., 'mean'):
```
mean_value = raster.get_stats("mean")
```

Get multiple statistics:

selected_stats = raster.get_stats(["mean", "max", "rmse"])

Using a custom callable statistic:

def custom_stat(data):
    return np.nansum(data > 100)  # Count the number of pixels above 100
custom_value = raster.get_stats(custom_stat)

Tests

The following tests ensure that the new functionality in the get_stats method works as expected:

Full Statistics:
- The test retrieves all available statistics (mean, median, max, min, sum, etc.) for a raster and ensures each statistic is present and not None.
Single Statistic:
- A test is included to fetch a single statistic, "Average" (which is an alias for "Mean"), and verifies that the returned value is a float.
Selected Statistics:
- The test checks the retrieval of a selected set of statistics: mean, maximum, and std. Additionally, it includes the use of a custom statistic function (percentile_95), which computes the 95th percentile of the data.
- This validates that both built-in and user-defined statistics can be fetched successfully.
Unrecognized Statistics:
- Ensure that get_stats return NaN and a warning is raised if the statistic requested by the users is not in the list of aliases
NMAD tests in test_stats:
- Ensure the good behavior of the scale factor
- Check if the returned nmad is the same than scipy.stats.median_abs_deviation

Documentation

A section about how to use get_stats has been added to the raster class documentation.

Test with xdem

geoutils/raster/raster.py

rhugonnet · 2024-12-06T22:54:46Z

Amazing! 😄
I only have small comments on structure and documenting the alias names above.
Also had to approve the test run (as this is your first PR on this repo, I think), all passing!

…stats()

adebardo

Can you add a note to the documentation?
Can you add a test with an inconsistent stats name and check the log returned?
Did you manage to get it running on xdem?

geoutils/raster/raster.py

tests/test_raster/test_raster.py

vschaffn · 2024-12-13T16:45:18Z

@adebardo I added a section about get_stats in the raster class documentation.
I did not manage to get it running on xdem yet, I am not sure I can without geoutils be released with the update because of circular import conflicts.

rhugonnet · 2024-12-13T18:17:44Z

@vschaffn To coordinate the changes between GeoUtils and xDEM, I typically do a pip install -e . in geoutils/ after installing xdem-dev with mamba. Then, if you are on the right branches for both packages locally, you should be able to run everything (tests, doc) with both versions where you want them to be.

If you want to run the tests/docs on xDEM's CI too, you can uncomment this line here in your xDEM PR: https://github.com/GlacioHack/xdem/blob/29a8cf4c8979d51723fe8b5a5a86e1fd5ba7cb96/environment.yml#L23. And potentially specify the GeoUtils branch at the end of the github link following https://pip.pypa.io/en/stable/topics/vcs-support/ 😉
This is especially useful if you know something might be behaving differently in CI than locally or is exclusive to CI, for example testing a dev change of the CI itself (otherwise can usually wait for GeoUtils release)

vschaffn · 2025-01-09T14:49:20Z

An example of get_stats() running on xdem has been added to the PR description

vschaffn added 3 commits December 3, 2024 17:16

feat: add more stats and get_stats method

62a687a

test: add raster stats tests

ee538d2

fix: add callable type in get_stats

2311ee6

vschaffn force-pushed the 660-add_missing_stats branch from 31c857d to 2311ee6 Compare December 4, 2024 10:35

fix: more aliases for stats

c99e5ff