Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support large files #31

Open
jreadey opened this issue Aug 27, 2015 · 3 comments
Open

Support large files #31

jreadey opened this issue Aug 27, 2015 · 3 comments

Comments

@jreadey
Copy link
Member

jreadey commented Aug 27, 2015

h5tojson.py and jsontoh5.py can't convert files whose size is comparable to the amount of physical memory on the machine the convertor is running on.

@jreadey
Copy link
Member Author

jreadey commented Aug 27, 2015

I'm tagging this as an "enhancement" rather than a bug since it was a known limitation of the design.

It may be worthwhile investigating using an alternative json parser such as: https://pypi.python.org/pypi/ijson/.

Would it make more sense to tackle this using a native-C implementation of the conversion tools?

@ccoulombe
Copy link

Any work towards this?

@jreadey
Copy link
Member Author

jreadey commented Nov 27, 2019

Sort of... In HSDS we use what is basically the hdf5-json schema for metadata, but chunk data is stored as blobs. See: https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md for a description. This works pretty well - we've used it for "files" as large as 50 TB. "files' is in quotes since what you get at the end is a large collection of files in a tree structure.

This was done to support the HDF service, but the same approach could be used outside the server.

What type of problem are you looking to solve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants