Different results for numpy and dask arrays #11

sfinkens · 2021-08-31T07:21:20Z

Describe the bug
The textory.statistics.variogram method yields different results for numpy and dask arrays.

To Reproduce

In [1]: from textory.statistics import variogram                  
In [2]: import dask.array as da

In [8]: arr = da.arange(100).reshape((10,10))

In [9]: variogram(arr, lag=4).compute()
Out[9]: 401.475

In [10]: variogram(arr.compute(), lag=4)
Out[10]: 251.49

Expected behavior
The results to be identical :)

Environment Info:

OS: Linux
Textory Version: 0.2.7b0
Dask Version: 2021.08.1

The text was updated successfully, but these errors were encountered:

BENR0 · 2021-08-31T09:14:26Z

@sfinkens thanks for reporting this. To a certain degree I know about this bug. It is a little sloppy and I wanted to document some limitations of how some calculations are implemented more thoroughly for quite some time.

That said, in this case the culprit is how border values are treated in map_overlap in the Dask case. Due to the fact that you use arange in the example the difference between the two calculations becomes very obvious. With random values the difference gets smaller. Never the less this is a bug.

I have been thinking about how to improve the accuracy of the calculations with regard to how borders are treated every now and then but I haven't come up with good ideas yet (which do make this significantly slower). I have a workaround in mind at least for the functions in the statistics module but haven't implemented and tested it yet. How quick would you need a solution?

Another question I have: How would you feel about an additional dependency like Numba from a user point of view?

sfinkens · 2021-09-07T16:23:38Z

Thanks for the feedback @BENR0 ! There's no hurry, we're still in a testing phase and evaluating the best solution for our application. That is computing block-wise variograms like so:

data_array.coarsen(lon=boxsize, lat=boxsize).reduce(textory.statistics.variogram, lag=lag)

We're already using numba in that project, so this would be fine.

sfinkens added the bug Something isn't working label Aug 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results for numpy and dask arrays #11

Different results for numpy and dask arrays #11

sfinkens commented Aug 31, 2021

BENR0 commented Aug 31, 2021

sfinkens commented Sep 7, 2021

Different results for numpy and dask arrays #11

Different results for numpy and dask arrays #11

Comments

sfinkens commented Aug 31, 2021

BENR0 commented Aug 31, 2021

sfinkens commented Sep 7, 2021