Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Model Saving to/Loading from Disk #53

Open
MShinkle opened this issue Feb 29, 2024 · 1 comment
Open

Optimized Model Saving to/Loading from Disk #53

MShinkle opened this issue Feb 29, 2024 · 1 comment

Comments

@MShinkle
Copy link

Big fan of this project!

From a quick search through the code and the API docs, I haven't found any recommendations for optimal saving and loading of models after fitting. From my understanding, scikit-learn recommends saving via pickle or joblib, which naturally works for himalaya models as well.

Alternatively, this demo from the himalaya docs mentions that saved hyper-parameters can be used to reconstruct the full model. Along this same vein, I've found that saving model coefficients as .npy, .npz or .hdf is much faster, and takes up much less space, than full serialization via joblib.

Is there any interest in incorporating this functionality into himalaya? If so, but no one else wants to, I can potentially take a pass at it.

@mvdoc
Copy link
Collaborator

mvdoc commented Feb 29, 2024

Hi Matt,

A functionality like that would be much appreciated! I'm not a big fan of dumping models with pickle/joblib because they are rarely future-proof, and as you describe, the files produced are way too large. I think that dumping the minimal set of hyperparameters and parameters would be better. The functionality should also deal with different backends, see e.g. this issue: #52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants