-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raster size limit? #17
Comments
Same issue as olenaboiko303. I write to provide more detailed information: I have been using the gdistance package for years for cost path analysis. Thank you for the great work. Recently, I was tasked to support the computation of accumulated cost using a cost surface (raster) that represents travel time on a road network. The road network cost surface r is class : RasterLayer The transition layer is computed as: The tr objects seems to compute normally:
The output raster though (cs object) only contains 0 values. I next started clipping the input raster: At 600x600km centered on facility.xy I get:
Note that the sparse matrix in tr now has more value cells, 6512490. With the entire raster, the sparse matrix had 6030184 value cells. class : SpatRaster When the area is clipped down to 500x500km centered on facility.xy I get the expected result:
The workstation has 250GB of RAM and about 180GB remain available at runtime. Any resolution on this issue would be greatly appreciated. Demetrios
attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doSNOW_1.0.20 snow_0.4-4 iterators_1.0.14 foreach_1.5.2 gdistance_1.6 Matrix_1.5-1 igraph_1.4.1 raster_3.6-14 sp_1.6-0 terra_1.7-15 loaded via a namespace (and not attached): [1] Rcpp_1.0.10 lattice_0.20-45 codetools_0.2-18 grid_4.2.2 magrittr_2.0.3 rlang_1.0.6 cli_3.6.0 tools_4.2.2 parallel_4.2.2 compiler_4.2.2 [11] pkgconfig_2.0.3 |
I may be having a similar issue. I need to create a conductance layer with a large extent(-18, 142, -32, 44), but the costDistance function does not work . I get the following error: `Error in validObject(.Object): invalid class “dsparseVector” object: 'i' slot is not strictly increasing after truncation towards zero
Conductance layer: This conductance layer with a smaller extent does work as it should:
My R session info: Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): and the R script used to create the conductance layer: `library(gdistance) alt <-raster('/filepath/') w <- alt.c altDiff <- function(x){x[2] - x[1]}# adj <- adjacent(alt.nona, cells=1:ncell(alt.nona), pairs=TRUE, directions=8) s <- calc(alt.c, fun=function(x){ x[x != "NA"] <- 1; return(x)} ) Has anyone figured out a solution for this? If not, is it possible to splice two separate layers? I would appreciate any insight. |
I don't expect it to help, but I would start by updating gdistance, Matrix, raster, and terra. They've all had significant updates and bug fixes since the versions you have, and maybe we'll get lucky. |
I think that did the trick - thanks Andrew. I do get a warning message, but fortunately the output is the same as when I tested the function with a smaller conductance layer in the old environment. Here is my R session info if anyone is interested:
Warning message:
|
Unfortunately I'm still unable to use the costDistance function with the extent(-18, 142, -32, 44) that I require. The environment I created above ended up producing an error when it was originally just a warning:
I created a new environment with the latest versions of r and g-distance (as well as a new conductance layer) and get the following error:
Session info: Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): It appears to be an issue/limitation with the Matrix package and all.equal function. I haven't had any luck with this - any suggestions? Many thanks! |
Unfortunately, my current computer only has 32GB of ram, which isn't enough to actually run raster data with 100+ million cells to try and pinpoint exactly what's happening. But here's a guess: Given a 100 million cell raster, the transition matrix is basically going to be a 100M x 100M matrix. Rather than storing all the zeros in memory (which would make the matrix about 80 petabytes), we can use various schemes to just keep track of the location of non-zero values. A simple one for this explanation is just the row and column numbers. Those row and column numbers will never be larger than the original number of raster cells. In other words, a non-zero value at the very end of the matrix would be row 100M and column 100M. The problem appears to be that all.equal() is converting the sparse matrix to a sparse vector for the comparison. There's nothing wrong with this approach, and it may even make the most sense practically, but it does mean that instead of representing the location of each non-zero with two index values, it's now going to do it with a single index value. In order to do that, a single index value will have to represent numbers up to the width x height of the matrix. So 100 million squared in this case, which is 1x10^16, or the index for a value in the very last spot of the matrix. For some reason, the Matrix package is using the double data type (numeric in R on 64-bit systems) to represent this single index value for sparse vectors. The double/numeric data type only has 2^53 bits of precision (which is where this number in the error came from). 2^53 =9.007x10^15, which is the largest integer index a double/numeric data type can accurately represent before precision issues start to pop up. Unfortunately, that's slightly too small for trying to convert the 100Mx100M matrix to a sparse vector. So basically, right around the 100M mark for number of raster cells, some things might start fall apart. The gdistance package wasn't really designed for such large rasters (the original author appeared to focus on readability and maintainability, which is totally reasonable). It might work for some things at this scale, but I can't guarantee anything. Unfortunately, I don't know what alternatives might work. You could try looking at the costDist() function in the terra package. You could also look at GRASS GIS which has the r.cost tool, which I think is available in QGIS. I'm not sure if anyone has ever worked out or implemented a tiling approach for most of the metrics in gdistance (Omniscape sort of does for circuit flow) |
Thanks for the explanation Andrew.
I had empirically determined that the largest working dimensionality was just shy of 95000 rows and columns. Given the 2^53 limitation, the exact number is 94906 x 94906. I completely understand that there was no provision for such large datasets at gdistance package inception time, but for many interesting applications where gdistance is rather useful, currently this is a considerable limitation.
I am wondering how much work would it take to use Rcpp and package bit64 that offers a integer64 type to do the conversion from the sparce matrix to a vector of indices. If implemented, it will push the max integer that can be represented without loss of precision from 2^53 to 2^60-1. The latter translates to a square matrix of 1,7073,741,823 rows and columns.
Many thanks for all your efforts.
Demetrios
From: Andrew Marx ***@***.***>
Sent: Friday, December 15, 2023 7:08 PM
To: AgrDataSci/gdistance ***@***.***>
Cc: Gatziolis, Demetrios - FS, OR ***@***.***>; Manual ***@***.***>
Subject: Re: [AgrDataSci/gdistance] raster size limit? (Issue #17)
Unfortunately, my current computer only has 32GB of ram, which isn't enough to actually run raster data with 100+ million cells to try and pinpoint exactly what's happening.
But here's a guess:
Given a 100 million cell raster, the transition matrix is basically going to be a 100M x 100M matrix. Rather than storing all the zeros in memory (which would make the matrix about 80 petabytes), we can use various schemes to just keep track of the location of non-zero values. A simple one for this explanation is just the row and column numbers. Those row and column numbers will never be larger than the original number of raster cells. In other words, a non-zero value at the very end of the matrix would be row 100M and column 100M.
The problem appears to be that all.equal() is converting the sparse matrix to a sparse vector for the comparison. There's nothing wrong with this approach, and it may even make the most sense practically, but it does mean that instead of representing the location of each non-zero with two index values, it's now going to do it with a single index value. In order to do that, a single index value will have to represent numbers up to the width x height of the matrix. So 100 million squared in this case, which is 1x10^16, or the index for a value in the very last spot of the matrix.
For some reason, the Matrix package is using the double data type (numeric in R on 64-bit systems) to represent this single index value for sparse vectors. The double/numeric data type only has 2^53 bits of precision (which is where this number in the error came from). 2^53 =9.007x10^15, which is the largest integer index a double/numeric data type can accurately represent before precision issues start to pop up. Unfortunately, that's slightly too small for trying to convert the 100Mx100M matrix to a sparse vector.
So basically, right around the 100M mark for number of raster cells, some things might start fall apart. The gdistance package wasn't really designed for such large rasters (the original author appeared to focus on readability and maintainability, which is totally reasonable). It might work for some things at this scale, but I can't guarantee anything.
Unfortunately, I don't know what alternatives might work. You could try looking at the costDist() function in the terra package. You could also look at GRASS GIS which has the r.cost tool, which I think is available in QGIS. I'm not sure if anyone has ever worked out or implemented a tiling approach for most of the metrics in gdistance (Omniscape sort of does for circuit flow)
-
Reply to this email directly, view it on GitHub<#17 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6FSJWZTBIT3R2FSUWNB6QLYJUGCNAVCNFSM6AAAAAAVETE7T2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGY4TKNBWGY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
|
Thank you for the detailed explanation and alternative suggestions Andrew. I understand that the package was not originally intended for such large raster layers. If there is any possibility of a working solution I have access to hpc resources and could attempt to help out on my end. Currently, I am attempting to run the costDistance function with a smaller layer: extent (-20, 55, -40, 40) or (10, 35, -35, 30) but get 'Inf' as the output with no error message; I also have to set the crs for the raster layer as it is originally NA. I do know that the following extent (71, 135, -10, 41) has worked in the past for a colleague (gdistance version 1.2-2 or earlier), but when I replicate it with my current environment I get 'Inf' as the output. The function does work with the smallest I've tried (and has a set crs): (30, 55, 0, 15). I'm not quite sure if it's a bug or something wrong with my own environment. I would be grateful for any insight. Thank you again for your support. R session Info: R version 4.3.2 (2023-10-31) attached base packages: |
If you're getting If you can't find an issue with that, try looking at the values in your transition matrix before and after the geocorrection. If the geocorrection is changing reasonable values to unreasonable values ( The
My main priority is just keeping the package functional and available on CRAN. In terms of enhancements, I might occasionally rewrite some calculations to improve performance if it isn't too involved. But if it requires significant changes, especially in package dependencies, then it probably won't happen. Partly because I want to keep the package easy to maintain and install, and partly because I'm not funded in any way to improve this package, so I can't spend a lot of time on it. I have another related connectivity package that I'm significantly more invested in, prioritizes performance, and I'm considering implementing similar equivalent functionality there. So If I am able to optimize these functions, they'll show up there. |
Hi Andrew - thank you for getting back to me and apologies for the delayed response. I ended up using an older version of R (4.2.1) and gdistance (1.6.2) that seems to have worked. I created a raster layer with the following dimensions: I couldn't resolve the issue with the R session info below: R version 4.2.1 (2022-06-23) Matrix products: default attached base packages: other attached packages: loaded via a namespace (and not attached): Edit: working version is 1.6.2 on R version 4.2.1 |
I wouldn't spend your time troubleshooting the function, but if you could try a few different intermediate versions to see when the behavior changed, then I can take a look at what changed |
I made a correction to my post above; I was using gdistance version 1.6.2 and R version 4.2.1 that was available on the server I'm working on. Not quite sure what it means for the package in that case. |
I added a significantly optimized version of |
I tried making a new raster with both functions and still get the |
Is there a friction raster size limit to process with
gdistance
?When I run the tool on a 4801x4381 raster, I produce the expected output. When I run it on a friction surface with 9295x12929 size, the output is still being produced (I use HPC resources to have enough memory, I used 256Gb) but it's completely wrong (for example, the pixel which fell into one of the targets is expected to have 0 for travel time output, but I'm seeing 8574.740234 value).
The text was updated successfully, but these errors were encountered: