Contributing to improve memory usage #501
Replies: 2 comments 71 replies
-
Hi @ameliefroessl, Thanks a lot for opening this discussion, this is one of our priorities and we'd very be happy to have some help! I can see you looked at the code quite a bit 🙂. You are right for the most part on what functions would have to be adjusted, some more details below. Before going into the technical bits of the code, here are the mains concepts we have to consider to reduce memory usage:
So the critical point really is 1. for now. 1a. Either, add an Xarray accessor to xDEM including Xarray input support in In short:
I would go for 1a, which is certainly longer to implement but would have durable backends (Dask, Xarray)! It's a big chunk of work (no pun intended). In a way, 1a includes the sampling of 1b, which would be done in Xarray/Dask instead of Rasterio. If you have other ideas and you're still interested to contribute, I'm happy to discuss more! I could share a short list of functions that would have to be modified, to try to map what exactly would need to be done in the code. |
Beta Was this translation helpful? Give feedback.
-
Just a comment to remind you that this was discussed in previous discussions (see #329). It makes sense if the new implementation of the slope is a lot faster, otherwise, we got really good performance with the current |
Beta Was this translation helpful? Give feedback.
-
Hi,
What a great project, the results are very impressive! Unfortunately I'm running into some memory issues when runnig it with larger DEMs (33666x33666 pixels). The current pipeline I'm running is:
I saw that there are already some discussions about how to improve the memory usage in the project in some of the issues. I was wondering if I could contribute somehow to try and improve this? I would like to propose a "divide and conquer approach". Specifically:
Deramp:
I think it would be possible to divide the raster into windows (see rasterio's windowed reading and writing for example), extract the relevant data points per window by iterating the windows and reading the window one by one. Once you have the datapoints (possibly subsampled) you can run the
_fit_func()
on the subset of that data. The approximated function could then again be applied on the raster in a windowed way as well. This way you could avoid holding the full raster as anumpy
array in memory.NuthKaab:
I realize this one is a bit trickier to ''divide and conquer", since you model the elevation function over the whole raster with the
scipy.interpolate.RectBivariateSpline
. However, since the final output is a x/y/z shift applied to the whole image, i could image something like the following might work. Divide the raster into windows, estimate the shifts per window, average the estimated shifts across all windows, apply the average shift to each window, continue until convergence. Apply the final x/y/z shift on each window in sequence. As I'm no expert on bivariate splines, I am unaware of the possible side effects such a division of the raster would have when modelling it. Especially around the edges of the windows. However, once possible mitigation strategy could be to overlap the windows with each other slightly.It is entirely possibly that I misunderstood how the algorithms work or that I've missed some crucial information, therefore any feedback or thoughts on these approaches are greatly appreciated! :) Looking forward to your reply.
Thanks in advance,
Amelie
Beta Was this translation helpful? Give feedback.
All reactions