Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : add ability to load model from memory buffer #9125

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 21, 2024

TODO: add some tests

Currently, loading a model depends on file system. That means the model must firstly be saved onto disk, then being loaded using llama_load_model_from_file

However, when compiled to web assembly, the virtual fs implementation is very inefficient and does not support files bigger than 2GB. Allow loading file directly from a buffer will be bypass the need of fs.

This PR introduce 2 new APIs:

  • gguf_init_from_buffer ==> this was left as a TODO, so I implemented it
  • llama_load_model_from_buffers ==> support loading multiple model shards (or model splits)

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 21, 2024
@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Aug 30, 2024
@sragrawal
Copy link

@ngxson Any update on this? This would be very useful for users who are downloading the model from an object store bucket into memory.

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 31, 2024

@sragrawal sorry for the slow response. I don't think the mentioned use case can be benefit from the current PR.

Basically, you can make a tmpfs mount and download the file from object store into that tmpfs. With this, everything will be stored on RAM.

I think the use cases for this PR should be to support environments that have poor FS support, like wasm or sandboxed environment (i.e. iOS development). But these use cases are currently quite rare so I can't invest much time on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants