Optimized Model Saving to/Loading from Disk #53

MShinkle · 2024-02-29T17:36:22Z

Big fan of this project!

From a quick search through the code and the API docs, I haven't found any recommendations for optimal saving and loading of models after fitting. From my understanding, scikit-learn recommends saving via pickle or joblib, which naturally works for himalaya models as well.

Alternatively, this demo from the himalaya docs mentions that saved hyper-parameters can be used to reconstruct the full model. Along this same vein, I've found that saving model coefficients as .npy, .npz or .hdf is much faster, and takes up much less space, than full serialization via joblib.

Is there any interest in incorporating this functionality into himalaya? If so, but no one else wants to, I can potentially take a pass at it.

mvdoc · 2024-02-29T19:35:42Z

Hi Matt,

A functionality like that would be much appreciated! I'm not a big fan of dumping models with pickle/joblib because they are rarely future-proof, and as you describe, the files produced are way too large. I think that dumping the minimal set of hyperparameters and parameters would be better. The functionality should also deal with different backends, see e.g. this issue: #52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized Model Saving to/Loading from Disk #53

Optimized Model Saving to/Loading from Disk #53

MShinkle commented Feb 29, 2024

mvdoc commented Feb 29, 2024

Optimized Model Saving to/Loading from Disk #53

Optimized Model Saving to/Loading from Disk #53

Comments

MShinkle commented Feb 29, 2024

mvdoc commented Feb 29, 2024