Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b3484 #272

Merged
merged 3 commits into from
Jul 28, 2024
Merged

b3484 #272

merged 3 commits into from
Jul 28, 2024

Conversation

Nexesenex
Copy link
Owner

No description provided.

yeahdongcn and others added 3 commits July 28, 2024 01:41
* Update doc for MUSA

Signed-off-by: Xiaodong Ye <[email protected]>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <[email protected]>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <[email protected]>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <[email protected]>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <[email protected]>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
* llama : refactor session file management

* llama : saving and restoring state checks for overflow

The size of the buffers should now be given to the functions working
with them, otherwise a truncated file could cause out of bound reads.

* llama : stream from session file instead of copying into a big buffer

Loading session files should no longer cause a memory usage spike.

* llama : llama_state_get_size returns the actual size instead of max

This is a breaking change, but makes that function *much* easier
to keep up to date, and it also makes it reflect the behavior
of llama_state_seq_get_size.

* llama : share code between whole and seq_id-specific state saving

Both session file types now use a more similar format.

* llama : no longer store all hparams in session files

Instead, the model arch name is stored.
The layer count and the embedding dimensions of the KV cache
are still verified when loading.
Storing all the hparams is not necessary.

* llama : fix uint64_t format type

* llama : various integer type cast and format string fixes

Some platforms use "%lu" and others "%llu" for uint64_t.
Not sure how to handle that, so casting to size_t when displaying errors.

* llama : remove _context suffix for llama_data_context

* llama : fix session file loading

llama_state_get_size cannot be used to get the max size anymore.

* llama : more graceful error handling of invalid session files

* llama : remove LLAMA_MAX_RNG_STATE

It's no longer necessary to limit the size of the RNG state,
because the max size of session files is not estimated anymore.

* llama : cast seq_id in comparison with unsigned n_seq_max
…CLI options (#8477)

* chore: Fix compiler warnings, add help text, improve CLI options

* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions

* chore : Add ignore rule for vulkan shader generator

Signed-off-by: teleprint-me <[email protected]>

* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp

Co-authored-by: 0cc4m <[email protected]>

* chore : Remove void and apply C++ style empty parameters

* chore : Remove void and apply C++ style empty parameters

---------

Signed-off-by: teleprint-me <[email protected]>
Co-authored-by: 0cc4m <[email protected]>
@Nexesenex Nexesenex merged commit 54ef11a into Nexesenex:spacestream Jul 28, 2024
18 of 26 checks passed
@github-actions github-actions bot added documentation Improvements or additions to documentation Nvidia GPU examples ggml labels Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation examples ggml Nvidia GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants