Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results for numpy and dask arrays #11

Open
sfinkens opened this issue Aug 31, 2021 · 2 comments
Open

Different results for numpy and dask arrays #11

sfinkens opened this issue Aug 31, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@sfinkens
Copy link

Describe the bug
The textory.statistics.variogram method yields different results for numpy and dask arrays.

To Reproduce

In [1]: from textory.statistics import variogram                  
In [2]: import dask.array as da

In [8]: arr = da.arange(100).reshape((10,10))

In [9]: variogram(arr, lag=4).compute()
Out[9]: 401.475

In [10]: variogram(arr.compute(), lag=4)
Out[10]: 251.49

Expected behavior
The results to be identical :)

Environment Info:

  • OS: Linux
  • Textory Version: 0.2.7b0
  • Dask Version: 2021.08.1
@sfinkens sfinkens added the bug Something isn't working label Aug 31, 2021
@BENR0
Copy link
Owner

BENR0 commented Aug 31, 2021

@sfinkens thanks for reporting this. To a certain degree I know about this bug. It is a little sloppy and I wanted to document some limitations of how some calculations are implemented more thoroughly for quite some time.

That said, in this case the culprit is how border values are treated in map_overlap in the Dask case. Due to the fact that you use arange in the example the difference between the two calculations becomes very obvious. With random values the difference gets smaller. Never the less this is a bug.

I have been thinking about how to improve the accuracy of the calculations with regard to how borders are treated every now and then but I haven't come up with good ideas yet (which do make this significantly slower). I have a workaround in mind at least for the functions in the statistics module but haven't implemented and tested it yet. How quick would you need a solution?

Another question I have: How would you feel about an additional dependency like Numba from a user point of view?

@sfinkens
Copy link
Author

sfinkens commented Sep 7, 2021

Thanks for the feedback @BENR0 ! There's no hurry, we're still in a testing phase and evaluating the best solution for our application. That is computing block-wise variograms like so:

data_array.coarsen(lon=boxsize, lat=boxsize).reduce(textory.statistics.variogram, lag=lag)

We're already using numba in that project, so this would be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants