You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to save a dataframe that contains a "series of lists" (they correspond to ionic clusters), however there is a problem with the serialization:
t = dtr.Treant('/tmp/hello')
t.data['hello'] = pd.DataFrame({ 'lists': [[0, 1, 2], [0, 1], [10, 22]] })
TypeError: Cannot serialize the column [lists] because
its data contents are [mixed] object dtype
I found that for dataframes, the msgpack format is pretty robust and efficient, maybe we could serialize dataframes using that?
It would, however, hurt retro-compatibility
The text was updated successfully, but these errors were encountered:
@gabrielelanaro this kind of DataFrame is not a good candidate for storage in HDF5 (as you found), but you could store it using datreant.data as a Python object by wrapping it in something that will trigger storage as a pickle. For example, you could do:
which would make the stored object a tuple and therefore it will get pickled instead of trying to cram it into an HDF5 file.
I realize pickle is a poor format for data curation (not entirely safe since deserialized objects could do nefarious things, not robust against versions of Python, etc.) but it is the lowest-common-denominator. We could consider using msgpack instead since it's often used as a substitute for pickle, but I'm not familiar with it or the arguments for it.
Happy to shift how datreant.data works so long as we can maintain backwards compatibility for existing stores.
I'm trying to save a dataframe that contains a "series of lists" (they correspond to ionic clusters), however there is a problem with the serialization:
I found that for dataframes, the msgpack format is pretty robust and efficient, maybe we could serialize dataframes using that?
It would, however, hurt retro-compatibility
The text was updated successfully, but these errors were encountered: