b3524 #279

Nexesenex · 2024-08-05T14:15:13Z

No description provided.

* Don't ignore llama.cpp params * Add fallback for max_tokens

This commit moves the comment for the c parameter from ggml_rope to ggml_rope_ext. The comment is currently incorrect as ggml_rope does not have a c parameter (freq_factors tensor). Signed-off-by: Daniel Bevenius <[email protected]>

* Fix Vulkan repeat op * Implement Vulkan concat op * Delete old Vulkan shader generator * Implement Vulkan im2col op * Implement Vulkan unary gelu_quick op * Implement Vulkan group_norm op * Implement Vulkan timestep_embedding op * Implement Vulkan upscale op * Fix Vulkan vk_context tensor extra index issue * Fix Vulkan matmul shader parameter bug * Properly fix Vulkan matmul shader parameter bug * Add Vulkan ADD f16 + f32 -> f16 operator support * Implement Vulkan tanh op * Fix Vulkan group count too large Validation error on non-Nvidia GPUs * Throw error when too much memory is requested * Fix another Vulkan group count too large Validation error on non-Nvidia GPUs * Fix matmul MMQ condition * Implement Vulkan pad op * Fix Vulkan crash when tensor is used multiple times in a compute graph * Add Vulkan CONCAT f16 + f16 -> f16 op * Add Vulkan LEAKY_RELU op

ggml-ci

* Fix Vulkan mul mat vec invalid results when ncols < warp size * Only run backend ops mul mat vec block size test if block size not already covered

* Vulkan-shaders: attempt fix compilation on windows * fix miss-matched parenthesis

… Llama 3.1 tool call support (#8858) * gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token * llama : find Llama-3.1 <|eom_id|> token id during vocab loading * llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

* py: add more authorship metadata from model card * fixup! py: add more authorship metadata from model card

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.

ramalama is a repo agnostic boring CLI tool that supports pulling from ollama, huggingface and oci registries. Signed-off-by: Eric Curtin <[email protected]>

ardfork and others added 15 commits August 4, 2024 20:16

Server: Don't ignore llama.cpp params (#8754)

978ba3d

* Don't ignore llama.cpp params * Add fallback for max_tokens

Install curl in runtime layer (#8693)

0d6fb52

cann: support q4_0 model (#8822)

c02b0a8

sync : ggml

5587e57

ggml-ci

vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)

064cdc2

* Fix Vulkan mul mat vec invalid results when ncols < warp size * Only run backend ops mul mat vec block size test if block size not already covered

llama : better replace_all (#8852)

f1ea514

readme : update model list (#8851)

400ae6f

cmake: fix paths for vulkan shaders compilation on Windows (#8573)

e31a4f6

* Vulkan-shaders: attempt fix compilation on windows * fix miss-matched parenthesis

py: Add more authorship metadata from model card (#8810)

1ef14b3

* py: add more authorship metadata from model card * fixup! py: add more authorship metadata from model card

ggml : fix overflows in elu function (#8866)

b9dfc25

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.

readme : add ramalama to the availables UI (#8811)

b42978e

ramalama is a repo agnostic boring CLI tool that supports pulling from ollama, huggingface and oci registries. Signed-off-by: Eric Curtin <[email protected]>

cann: fix buffer_num and runtime speed slowly error (#8865)

bc0f887

github-actions bot added testing examples python server ggml devops Vulkan script labels Aug 5, 2024

Nexesenex merged commit 12c4918 into Nexesenex:spacestream Aug 5, 2024
47 of 55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3524 #279

b3524 #279

Nexesenex commented Aug 5, 2024

b3524 #279

b3524 #279

Conversation

Nexesenex commented Aug 5, 2024