Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when saving a dataframe that contains pyobjects #18

Open
gabrielelanaro opened this issue Sep 23, 2016 · 1 comment
Open

Problem when saving a dataframe that contains pyobjects #18

gabrielelanaro opened this issue Sep 23, 2016 · 1 comment

Comments

@gabrielelanaro
Copy link

gabrielelanaro commented Sep 23, 2016

I'm trying to save a dataframe that contains a "series of lists" (they correspond to ionic clusters), however there is a problem with the serialization:

t = dtr.Treant('/tmp/hello')
t.data['hello'] = pd.DataFrame({ 'lists': [[0, 1, 2], [0, 1], [10, 22]] })

TypeError: Cannot serialize the column [lists] because
its data contents are [mixed] object dtype

I found that for dataframes, the msgpack format is pretty robust and efficient, maybe we could serialize dataframes using that?

It would, however, hurt retro-compatibility

@dotsdl
Copy link
Member

dotsdl commented Sep 29, 2016

@gabrielelanaro this kind of DataFrame is not a good candidate for storage in HDF5 (as you found), but you could store it using datreant.data as a Python object by wrapping it in something that will trigger storage as a pickle. For example, you could do:

t = dtr.Treant('/tmp/hello')
t.data['hello'] = (pd.DataFrame({ 'lists': [[0, 1, 2], [0, 1], [10, 22]] }),)

which would make the stored object a tuple and therefore it will get pickled instead of trying to cram it into an HDF5 file.

I realize pickle is a poor format for data curation (not entirely safe since deserialized objects could do nefarious things, not robust against versions of Python, etc.) but it is the lowest-common-denominator. We could consider using msgpack instead since it's often used as a substitute for pickle, but I'm not familiar with it or the arguments for it.

Happy to shift how datreant.data works so long as we can maintain backwards compatibility for existing stores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants