Releases · ngxson/wllama

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

Download model via downloadModel()
List all downloaded models using CacheManager.list()
Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

On the first call, you have 2 messages: user: hello, assistant: hi
On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Add downloadModel function by @ngxson in #95
fix log print and downloadModel by @ngxson in #100
Add main example (chat UI) by @ngxson in #99
Improve main UI example by @ngxson in #102
implement KV cache reuse by @ngxson in #103

Full Changelog: 1.14.2...1.15.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

What's Changed

Releases: ngxson/wllama

1.17.1

What's Changed

Contributors

1.17.0

What's Changed

Contributors

1.16.4

What's Changed

Contributors

1.16.3

What's Changed

Contributors

1.16.2

What's Changed

Contributors

1.16.1

What's Changed

Contributors

1.16.0

What's Changed

Contributors

1.15.0

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

1.14.2

1.14.0

What's Changed