Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute deterministic hash tokens of datasets in a collection #34

Closed
wants to merge 2 commits into from

Conversation

andersy005
Copy link
Member

@andersy005 andersy005 commented Dec 21, 2021

In [1]: import xcollection as xc, xarray as xr

In [2]: ds = xr.DataArray(name='test')

In [3]: c = xc.Collection({'foo': xr.DataArray(name='foo'), 'bar': xr.DataArray(name='bar')})

In [4]: c
Out[4]: 
<Collection (2 keys)>
🔑 foo
<xarray.Dataset>
Dimensions:  ()
Data variables:
    foo      float64 nan

🔑 bar
<xarray.Dataset>
Dimensions:  ()
Data variables:
    bar      float64 nan
In [5]: c.tokens()
Out[5]: 
{'foo': 'b3833b28a0abdbd20ac2e504af93da29',
 'bar': '5628e75835f16e4d904f195bf7f661ac'}

In [6]: import dask.base

In [7]: c.tokens() == c.tokens()
Out[7]: True

In [8]: dask.base.tokenize(c)
Out[8]: '5b78ff477a522d18d12ac5b4939c71d3'

In [9]: dask.base.tokenize(c) == dask.base.tokenize(c)
Out[9]: True

Copy link
Contributor

@mgrover1 mgrover1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @andersy005 - the behavior from earlier today is still slightly concerning, but this seems like the best option/approach!

@andersy005
Copy link
Member Author

Looks great! Thanks @andersy005 - the behavior from earlier today is still slightly concerning, but this seems like the best option/approach!

After further local testing, I'm now less confident that this PR works as expected...

The inconsistency in the tokenization process in xarray seem to be a problem... It appears to be non-deterministic depending on the type of coordinates one has in a dataarray/dataset... See my comment in pydata/xarray#4738 (comment)

@andersy005 andersy005 marked this pull request as draft December 21, 2021 18:18
@andersy005 andersy005 closed this Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tokens to the collections
2 participants