Contributing to improve memory usage #501

ameliefroessl · 2024-04-04T12:24:56Z

ameliefroessl
Apr 4, 2024

Hi,

What a great project, the results are very impressive! Unfortunately I'm running into some memory issues when runnig it with larger DEMs (33666x33666 pixels). The current pipeline I'm running is:

pipeline = xdem.coreg.Deramp(poly_order=1) + xdem.coreg.NuthKaab(subsample=0.1)
pipeline.fit(reference_dem, input_dem, inlier_mask=mask, verbose=True)
aligned_dem = pipeline.apply(input_dem)

I saw that there are already some discussions about how to improve the memory usage in the project in some of the issues. I was wondering if I could contribute somehow to try and improve this? I would like to propose a "divide and conquer approach". Specifically:

Deramp:
I think it would be possible to divide the raster into windows (see rasterio's windowed reading and writing for example), extract the relevant data points per window by iterating the windows and reading the window one by one. Once you have the datapoints (possibly subsampled) you can run the _fit_func() on the subset of that data. The approximated function could then again be applied on the raster in a windowed way as well. This way you could avoid holding the full raster as a numpy array in memory.

NuthKaab:
I realize this one is a bit trickier to ''divide and conquer", since you model the elevation function over the whole raster with the scipy.interpolate.RectBivariateSpline. However, since the final output is a x/y/z shift applied to the whole image, i could image something like the following might work. Divide the raster into windows, estimate the shifts per window, average the estimated shifts across all windows, apply the average shift to each window, continue until convergence. Apply the final x/y/z shift on each window in sequence. As I'm no expert on bivariate splines, I am unaware of the possible side effects such a division of the raster would have when modelling it. Especially around the edges of the windows. However, once possible mitigation strategy could be to overlap the windows with each other slightly.

It is entirely possibly that I misunderstood how the algorithms work or that I've missed some crucial information, therefore any feedback or thoughts on these approaches are greatly appreciated! :) Looking forward to your reply.

Thanks in advance,

Amelie

rhugonnet · 2024-04-05T23:42:03Z

rhugonnet
Apr 5, 2024
Maintainer

Hi @ameliefroessl,

Thanks a lot for opening this discussion, this is one of our priorities and we'd very be happy to have some help! I can see you looked at the code quite a bit 🙂. You are right for the most part on what functions would have to be adjusted, some more details below.

Before going into the technical bits of the code, here are the mains concepts we have to consider to reduce memory usage:

(Not implemented) Chunked reading/writing: As you pointed out, out-of-memory operations would need to be implemented, right now the Raster class does not support those,
(Implemented) Subsampling: Even with out-of-memory operations implemented, subsampling is key. Most coregistrations need to optimize a function, typically with least-squares, which requires feeding a subsample of the data all at once. In general, the accuracy of coregistration methods don't improve much when using more than ~1 million samples, so using a random subsample distributed in the DEM is normally sufficient for most methods, and reduce the need to run a calculation involving all samples in the DEMs.
(Optional; not implemented) Combine multiple optimizations: This is what you mention in your last proposition: optimizing multiple independent realizations of the coregistration algorithms and then combining them. Right now we only support applying independent realizations to the same chunks they're derived from in BlockwiseCoreg, but it could be a nice feature (and not too big an effort) to allow to combine all the transformations derived for each block of a BlockwiseCoreg into a "average" or "weighted (sample count) average" transform for the entire DEM. It is essentially a new type of method, which (with chunked reading/writing from 1.) would allow to use more samples overall. But, as pointed out in 2., using a subsample that fits all at once in memory during a single optimization (millions of samples does fairly easily) is normally not too limiting to get good coregistration accuracy.

So the critical point really is 1. for now.
What we had in mind to address it:

1a. Either, add an Xarray accessor to xDEM including Xarray input support in Coreg (#392). The accessor would be largely inherited from that of Rioxarray. This would then allow to use dask.array across all methods and load only chunks (through Rioxarray which is using the Rasterio window functions you mention), while maintaining access to the georeferenced attributes we need. We don't plan to add anything like this into the Raster class directly, a Raster object would always be either fully loaded or not (we feel there's no use in reproducing the mountain of work already done in Xarray/Rioxarray for this, we'd just mirror the same API for both object types and leave it to the user to choose which one fits best).
1b. Or, add a point sampler for raster inputs (easier to implement). As all coregistration functions require subsampling anyway, we could add a wrapper that simply samples two rasters at the same subsample points (converting them into a "point-point" input) without loading the full raster in memory, using rasterio.sample. We already have this implemented in GeoUtils: https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/raster.py#L3940 and rely on a .get_mask() function that only loads the nodata mask of the raster from memory (to ensure we sample only valid points are sampled, otherwise there's no way of knowing ahead). Normally a boolean array doesn't use too much memory so that is fine. However this only works for functions that do not require neighbouring points in the calculations, so almost all except NuthKaab that requires a gridded input to derive slope/aspect attributes. And it would require modifying the current "point-raster" logic (#480) to accept a new type of "point-point" input where both 3D points clouds have the same coordinates (right now they are expected to differ possibly, so only supported by point registration functions like ICP).

In short:

Either the Xarray/Dask way, and we'd need to adjust the following operations to be supported out-of-memory: subsampling (easy), terrain attributes (moderate) for NuthKaab for instance, reprojection/shifting (difficult, currently the RectBivariateSpline you mention, and if I'm not mistaken Rioxarray still does not support out-of-memory reprojection... But there are implementations elsewhere). Thankfully we wouldn't need the full xDEM Xarray accessor to be done for this, we only need to add support for Xarray.DataArray inputs in the Coreg classes and use the rio accessor from Rioxarray to get georeferencing;
Or, the Point sampling way, which requires to add new logic to understand if the "point-point" input is aligned or not... Otherwise easy for methods that don't shift the geotransform like Deramp. But for methods that require terrain attributes like NuthKaab, and require iterative shifting, it would be a difficult implementation.

I would go for 1a, which is certainly longer to implement but would have durable backends (Dask, Xarray)! It's a big chunk of work (no pun intended). In a way, 1a includes the sampling of 1b, which would be done in Xarray/Dask instead of Rasterio.

If you have other ideas and you're still interested to contribute, I'm happy to discuss more! I could share a short list of functions that would have to be modified, to try to map what exactly would need to be done in the code.

71 replies

rhugonnet Jun 10, 2024
Maintainer

Perfect! Sorry I didn't realize from your example above that fit_func was passed to the fit() data and not apply() 😅

ok cool! then maybe it makes sense for me to wait with AffineCoreg until those changes have been made?

Yes, I have other priorities earlier this week (in part finalizing #502), but should get back to #530 on Friday and be done by mid-next week hopefully 🙂

ameliefroessl Jun 12, 2024
Author

Sorry I didn't realize from your example above that fit_func was passed to the fit() data and not apply()

No worries! My example was not clear on that 😅

Yes, I have other priorities earlier this week (in part finalizing #502), but should get back to #530 on Friday and be done by mid-next week hopefully

sounds good!

I think I worked through most of the TODO list on the MR, unless you see something else still missing? I think the next stage would be adapting the general BiasCorr. Do you have any pointers on how to make that possible? No worries if you're busy with your other priorities. There are also some other unrelated things I need to work on unfortunately.

rhugonnet Jun 12, 2024
Maintainer

Looking at the PR again, I think you almost already support the general BiasCorr!
The little more that'd be needed:

For the fit(), in _fit_biascorr, you would just need to take your subsampling modification when the inputs are Dask arrays out of the "Option 1/2/3" if statements for "fit", "bin" and "bin_and_fit" and use those subsamples everywhere consistently. Adding a comment directly in the PR to point to the location!
For the apply(), in _apply_rst, you would just have to create the same kind of map_blocks wrapper for the binning apply functions as you did for the fit_func. They are also embarrassingly parallel like the one for fit, so should run as easily on map_blocks!

rhugonnet Jun 20, 2024
Maintainer

@ameliefroessl I am leaving on holidays for the next month in a couple days, will be completely offline and back early August. Have a great early summer!

ameliefroessl Jun 23, 2024
Author

Thanks for letting me know. Enjoy your holiday!! :)

adehecq · 2024-04-24T11:59:29Z

adehecq
Apr 24, 2024
Maintainer

2. The derivation of windowed terrain attributes, like for NuthKaab in

xdem/xdem/terrain.py

Line 704 in 3b3e59c

def get_terrain_attribute(

which will soon replace the

xdem/xdem/coreg/affine.py

Line 77 in 3b3e59c

def _calculate_slope_and_aspect_nuthkaab(dem: NDArrayf) -> tuple[NDArrayf, NDArrayf]:

Just a comment to remind you that this was discussed in previous discussions (see #329). It makes sense if the new implementation of the slope is a lot faster, otherwise, we got really good performance with the current np.gradient function...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to improve memory usage #501

{{title}}

Replies: 2 comments 71 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Contributing to improve memory usage #501

ameliefroessl Apr 4, 2024

Replies: 2 comments · 71 replies

rhugonnet Apr 5, 2024 Maintainer

rhugonnet Jun 10, 2024 Maintainer

ameliefroessl Jun 12, 2024 Author

rhugonnet Jun 12, 2024 Maintainer

rhugonnet Jun 20, 2024 Maintainer

ameliefroessl Jun 23, 2024 Author

adehecq Apr 24, 2024 Maintainer

ameliefroessl
Apr 4, 2024

Replies: 2 comments 71 replies

rhugonnet
Apr 5, 2024
Maintainer

rhugonnet Jun 10, 2024
Maintainer

ameliefroessl Jun 12, 2024
Author

rhugonnet Jun 12, 2024
Maintainer

rhugonnet Jun 20, 2024
Maintainer

ameliefroessl Jun 23, 2024
Author

adehecq
Apr 24, 2024
Maintainer