-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Xarray accessor mirroring Raster
class
#446
base: main
Are you sure you want to change the base?
Conversation
Raster
APIRaster
class
@adehecq @atedstone @erikmannerfelt This PR is also ready for your first review! It is not finalized, but at a good stage to hear your feedback, questions, recommendations, and then move forward to finalize it. Once you are done with this one, you should look at the one for the |
This PR adds the
rst
Xarray accessor mirroring theRaster
class.The accessor allows to access all attributes and run all methods already implemented for rasters in GeoUtils from a
xarray.DataArray
object (e.g.,ds.rst.reproject()
), and thereby easily access the rest of the Xarray functionalities throughds
(such as plotting) and other low-level behaviour (such as implicit loading, cached loading, chunked loading).It also opens up the opportunity to easily add Dask support to run our functionalities out-of-memory. This requires our functionalities to support a
da.Array
as input. This is not mandatory (if the functionality only runs with NumPy array, it will simply load theda.Array
in memory immediately), but is very practical when it is supported.We already did a lot of work to implement the most complex Dask functionalities in #537 (namely
reproject()
,subsample()
andinterp_points()
). Most other functionalities are much easier to support (usingdask.map_blocks
or equivalent).Resolves #383
Resolves #567
Facilitated by recent code re-structuring
Recent code re-structuring moved out methods out of the
Raster
orVector
class into separate modules. In some cases, this changed the argument of those (non-public) wrapper methods to accept base inputs (array, transform, crs) instead of the class object itself. Among other things, this facilitates the transition to using our functions with an accessor that has a slightly different object type (classic NumPy array instead of a NumPy masked-array).See #624.
Summary of changes
Most content of the
Raster
class was moved into a non-publicRasterBase
parent class, containing all attributes and methods shared by theRaster
andrst
accessor classes.The
Raster
andrst
are subclasses ofRasterBase
, and only implement method specific to their object type (such asset_mask()
forRaster
that uses NumPy masked-array, or its__array_interface__
specific to masked arrays).Remaining in
Raster
are only functionalities specific to theRaster
object itself:__init__
,load()
,is_loaded
,from_array()
,copy()
,set_mask()
, etc,__array_interface__
,__add__
, etc.Added in the
rst
accessor are only functionalities specific to the accessor:__init__
,from_array()
,copy()
.A new
_is_xr
boolean attribute identifies if theRasterBase.data
is an Xarray object or not.This allows to make choices where necessary, which is only used to return the main attributes stored in the object itself:
data
,crs
,transform
andnodata
.All methods returning a raster object (like
reproject()
orcrop()
) now usefrom_array()
that is overridden inRaster
andrst
to ensure they return the same type as the input: aRaster
returns aRaster
input, and axarray.DataArray
returns axarray.DataArray
input.All other attributes and methods return exactly the same non-raster input.
A new method
geoutils.open_raster
is added to open a raster as axr.DataArray
(built on top ofrioxarray.open_rasterio()
). The difference is that ouropen_raster
forces the data type to befloat32
at minimum, and replaces nodata values to NaNs to natively support most NumPy array operations with nodata propagation.This seemed required because Rioxarray does not mask nodata values while preserving the nodata value in its metadata, which is incompatible with the behaviour we need. (To give an clear example: With Rioxarray, either you load the array with -9999 in it and the
ds.rio.nodata
is -9999, or you load the array with NaNs in it, and theds.rio.nodata
isNaN
).I did not find another way to do this here...
New tests
Adding new tests is simple: We simply need to check that all functions give the exact same result for a raster opened as a
Raster
, or as axr.DataArray
.For this, the new tests introduce a function to check the equality of a
Raster
andxarray.DataArray
.Then, they check that all common attributes and methods of
RasterBase
run and return exactly the same output (or equal to the other object type when output is aRaster
/xarray.DataArray
).Discussion of core differences
The problem with the
rst
accessor object is that, if I'm not mistaken, we won't have access to functionalities that are not explicit such as__array_interface__
. Thus, we likely cannot mirror the entire behaviour of theRaster
class (for instance, nooverloading_check
to verify that the georeferencing is the same during an array or arithmetic operation). We can look more into it to be sure, but I don't think it is possible...Thankfully, Xarray generally has similar behaviour as our
Raster
class, from the implicit loading to array-interfacing. We might want to adjust our functionalities to ensure we mirror that behaviour when possible, so that the code is written the same.The main difference is that Xarray won't natively support nodata in its operations for integer arrays (no masked-array support in Xarray), and thus those need to be converted to NaN-arrays to do so, which increases RAM usage significantly for datasets of integer type. Here again, thankfully, chunked Dask-support can compensate for this, and run any NaN-array size.
So there are pros and cons to using the
Raster
or therst
accessor. We can try to reconcile differences where possible, and for those that are structural to the data objects, we should simply explain them clearly on a documentation page and leave the choice to users! 😄TO-DO
Code
Raster.from_array()
toRasterAccessor
class (orRasterBase
class?) and individual setting operations (fortransform
,crs
,nodata
, andarea_or_point
) to make all methods (reproject, etc) naturally work on bothRaster
andds.rst
,_reproject
, etc),delayed
function (in_reproject
,_interp_points
and_subsample
) by detecting automatically if input array is a Dask array.RasterBase
,RasterAccessor
functionalities (comparing tods.rio
),xr.DataArray
objects as match-reference input.Documentation
rst
accessor to "The georeferenced raster" page,Raster.reproject()
ords.rst.reproject()
,rst
accessor.Other Dask support to add (will be moved as issues for later PRs)
reduce_points
function can copy the same logic asinterp_points
,crop
function usingisel
of Rioxarray,geocube
forrasterize
/polygonize
support,proximity
function would be a bit of work...