-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overflow error on 3D datasets #38
Comments
Can you also use Ripser on high dimensional data, like 10D? |
Homology dimension k computations for a dataset of size N require that it be possible in principle to enumerate all (k+1)-simplices of the full simplex with N vertices. In your case (k=2) this means enumerating all 3-simplices (which are 4-tuples of vertices), and this requires that the binomial coefficient (N choose 4) not exceed 2^63 - 1 (the maximum index for signed 64-bit integers). With N=250000, N choose 4 is approximately 1.6e20, while 2^63 is about 9.2e18. [Edit: I may be off by 1 in some of the above, but the main point remains.] After modifying the backend to work with 128-bit integers, you should be able to run Ripser on your dataset. Both @ubauer and @MonkeyBreaker have tried this at some point AFAIK. It can be done! |
@MassEast the dimensionality of the data is not an obstacle, Ripser simply looks at the matrix of pairwise distances between all pairs of points in the dataset. This takes a little longer to compute in higher dimensions, but only marginally so, especially compared to the time it takes to compute persistence. |
Thanks @ulupo for the information. |
There is a branch at https://github.com/Ripser/ripser/tree/128bit implementing this. It should be mostly up to date, and it's actually pretty much a one line edit. Since 128 bit ints seem to be non standardized, this might not work out of the box for some compilers. |
Thanks! |
Hello and thank you for your work!
I am trying to use Ripser to compute persistence diagrams of rather large 3D datasets, coming from the Open-SciVis-Datasets website.
However, Ripser throws a
std::overflow_error
when I try to compute persistence pairs up to the dimension 2. If I understand correctly the C++ code, this exception is related to the computation of binomial coefficients which depend on the size of the input distance matrix and the maximum dimension requested.This issue might be related to #25 and #32 since they feature similar problems. I already opened an issue in the scikit-tda Python wrapper (I wanted to use Ripser via this Python API) but since this is more related to the C++ code, this issue seems better to belong here.
Below, you will find a Python script I used to trigger this overflow error. It takes a raw file from Open-Scivis-Datasets and iterates over the edges of the cubical complex to generate a sparse matrix in sparse triplet format that is fed to the Ripser executable. Although the diagram is computed as expected with the smallest datasets (
nucleon
,marschner_lobb
,silicium
, up to 120k vertices), the error occurs withfuel
,neghip
(more than 250k vertices) and every larger dataset.It there a way to circumvent this computation of these binomial coefficients to handle large datasets (up to 1M vertices)?
Thanks in advance for your help,
Best regards,
Pierre Guillou
The text was updated successfully, but these errors were encountered: