-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
saving a model into HDF5 #9
Comments
Hi, Currently it is not possible. However, it would not be too difficult to implement. I could implement HDF5 support where each sample is dumped into a separate HDF5 file. Would that be useful for you? |
I think it is not a very good idea, because the number of files will be great and overhead of writing will be quite big. Is it possible to collect and dump all records together? |
Unfortunately, this will work for only small models where all samples fit into memory. For larger cases where one sample takes a GB or more, it is not feasible to store all in memory. |
Hi, Would it be possible to return the model matrices directly from the |
Hi, |
thanks for the quick response. I took a pipeline from someone which followed approximately your procedure (read the csv files, use the model matrices and then delete the csv files). Then I modified the pipeline not to delete the model files ( so I can use them later). The result was hilarious: after 10 minutes running of the code parallel in 80 jobs, I exceeded my harddisk quota on the cluster and then I spent more than an hour to delete all the files created :) thanks anyway, I will convert the csv files to binary data and save them compressed. hopefully that will be smaller. |
Using binary format should reduce the needed disk-space, also float32 should have sufficient precision for storing the matrices. I hope this solves the issue :). The |
Hi,
Is it possible to dump a model into HDF5 file? The problem is when i try to dump 1500 samples with your function save_prefix the total disk space required is more than 10Gb!, because csv is not well-designed to store numerical values.
The text was updated successfully, but these errors were encountered: