llama : add ability to load model from memory buffer #9125

ngxson · 2024-08-21T19:44:30Z

TODO: add some tests

Currently, loading a model depends on file system. That means the model must firstly be saved onto disk, then being loaded using llama_load_model_from_file

However, when compiled to web assembly, the virtual fs implementation is very inefficient and does not support files bigger than 2GB. Allow loading file directly from a buffer will be bypass the need of fs.

This PR introduce 2 new APIs:

gguf_init_from_buffer ==> this was left as a TODO, so I implemented it
llama_load_model_from_buffers ==> support loading multiple model shards (or model splits)

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

sragrawal · 2024-09-22T23:42:46Z

@ngxson Any update on this? This would be very useful for users who are downloading the model from an object store bucket into memory.

ngxson · 2024-10-31T13:08:19Z

@sragrawal sorry for the slow response. I don't think the mentioned use case can be benefit from the current PR.

Basically, you can make a tmpfs mount and download the file from object store into that tmpfs. With this, everything will be stored on RAM.

I think the use cases for this PR should be to support environments that have poor FS support, like wasm or sandboxed environment (i.e. iOS development). But these use cases are currently quite rare so I can't invest much time on that.

ngxson added 2 commits August 21, 2024 20:37

llama : load model from buffer

112b664

llama_load_model_from_buffers

ad1af06

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 21, 2024

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add ability to load model from memory buffer #9125

llama : add ability to load model from memory buffer #9125

ngxson commented Aug 21, 2024

sragrawal commented Sep 22, 2024

ngxson commented Oct 31, 2024 •

edited

Loading

llama : add ability to load model from memory buffer #9125

Are you sure you want to change the base?

llama : add ability to load model from memory buffer #9125

Conversation

ngxson commented Aug 21, 2024

sragrawal commented Sep 22, 2024

ngxson commented Oct 31, 2024 • edited Loading

ngxson commented Oct 31, 2024 •

edited

Loading