merge upstream #42

l3utterfly · 2024-10-10T02:44:13Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* Add scaffolding for ggml logging macros * Metal backend now uses GGML logging * Cuda backend now uses GGML logging * Cann backend now uses GGML logging * Add enum tag to parameters * Use C memory allocation funcs * Fix compile error * Use GGML_LOG instead of GGML_PRINT * Rename llama_state to llama_logger_state * Prevent null format string * Fix whitespace * Remove log callbacks from ggml backends * Remove cuda log statement

* Update README.md fixed RNG seed info * changed print format to unsigned

ggml : remove test-backend-buffer ggml : fix CUDA build warnings

* rerank : use [SEP] token instead of [BOS] ggml-ci * common : sanity check for non-NULL tokens ggml-ci * ci : adjust rank score interval ggml-ci * ci : add shebang to run.sh ggml-ci

Co-authored-by: Samuel Morris <[email protected]>

* Single allocation of encode_async block with non-ARC capture in ggml-metal.m * Moving Block_release to the deallocation code * Release encode block when re-setting encoding buffer count if needed * Update ggml/src/ggml-metal.m --------- Co-authored-by: Georgi Gerganov <[email protected]>

* ggml : add metal backend registry / device ggml-ci * metal : fix names [no ci] * metal : global registry and device instances ggml-ci * cont : alternative initialization of global objects ggml-ci * llama : adapt to backend changes ggml-ci * fixes * metal : fix indent * metal : fix build when MTLGPUFamilyApple3 is not available ggml-ci * fix merge * metal : avoid unnecessary singleton accesses ggml-ci * metal : minor fix [no ci] * metal : g_state -> g_ggml_ctx_dev_main [no ci] * metal : avoid reference of device context in the backend context ggml-ci * metal : minor [no ci] * metal : fix maxTransferRate check * metal : remove transfer rate stuff --------- Co-authored-by: slaren <[email protected]>

Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/bcef6817a8b2aa20a5a6dbb19b43e63c5bf8619a?narHash=sha256-HO4zgY0ekfwO5bX0QH/3kJ/h4KvUDFZg8YpkNwIbg1U%3D' (2024-09-12) → 'github:hercules-ci/flake-parts/3d04084d54bedc3d6b8b736c70ef449225c361b1?narHash=sha256-K5ZLCyfO/Zj9mPFldf3iwS6oZStJcU4tSpiXTMYaaL0%3D' (2024-10-01) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01) → 'https://github.com/NixOS/nixpkgs/archive/fb192fec7cc7a4c26d51779e9bab07ce6fa5597a.tar.gz?narHash=sha256-0xHYkMkeLVQAMa7gvkddbPqpxph%2BhDzdu1XdGPJR%2BOs%3D' (2024-10-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26) → 'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

…ganov#9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers

Signed-off-by: Masanari Iida <[email protected]>

* server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests

* ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it

An updated version will be added in ggerganov#9787

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

…anov#9804)

OuadiElfarouki and others added 30 commits October 3, 2024 07:50

Fixed dequant precision issues in Q4_1 and Q5_1 (ggerganov#9711)

5639971

rpc : enable vulkan (ggerganov#9714)

841713e

closes ggerganov#8536

convert : handle tokenizer merges format from transformers 4.45 (gger…

e3c355b

…ganov#9696)

ggml-backend : add device description to CPU backend (ggerganov#9720)

a7ad553

metal : fix compute pass descriptor autorelease crash (ggerganov#9718)

5d5ab1e

ggml: refactor cross entropy loss CPU impl. (ggml/976)

eee39bd

ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)

fabdc3b

sync : ggml

1bb8a64

metal : remove abort (skip) (ggml/0)

d5ed2b9

Fixed RNG seed docs (ggerganov#9723)

133c7b4

* Update README.md fixed RNG seed info * changed print format to unsigned

ci : fine-grant permission (ggerganov#9710)

f3fdcfa

ggml : fixes after sync (ggml/983)

ff56576

ggml : remove test-backend-buffer ggml : fix CUDA build warnings

ggml : fix typo in example usage ggml_gallocr_new (ggml/984)

55951c0

sync : ggml

1788077

Add Llama Assistant (ggerganov#9744)

71967c2

metal : zero-init buffer contexts (whisper/0)

905f548

sync : ggml

58b1669

rerank : use [SEP] token instead of [BOS] (ggerganov#9737)

8c475b9

* rerank : use [SEP] token instead of [BOS] ggml-ci * common : sanity check for non-NULL tokens ggml-ci * ci : adjust rank score interval ggml-ci * ci : add shebang to run.sh ggml-ci

vulkan : retry allocation with fallback flags (whisper/2451)

b0915d5

Co-authored-by: Samuel Morris <[email protected]>

sync : llama.cpp

b6d6c52

readme : fix typo [no ci]

f4b2dcd

contrib : simplify + minor edits [no ci]

d5cb868

Update building for Android (ggerganov#9672)

f1af42f

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

ggml : add backend registry / device interfaces to BLAS backend (gger…

6374743

…ganov#9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers

scripts : fix spelling typo in messages and comments (ggerganov#9782)

fa42aa6

Signed-off-by: Masanari Iida <[email protected]>

server : better security control for public deployments (ggerganov#9776)

458367a

* server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests

slaren and others added 4 commits October 8, 2024 14:21

examples : remove llama.vim

3dc48fe

An updated version will be added in ggerganov#9787

perplexity : fix integer overflow (ggerganov#9783)

e702206

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

cmake : do not build common library by default when standalone (ggerg…

c81f3bb

…anov#9804)

l3utterfly merged commit 2f7d58e into layla-build Oct 10, 2024
72 of 85 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python android server ggml Apple Metal script nix labels Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge upstream #42

merge upstream #42

l3utterfly commented Oct 10, 2024

merge upstream #42

merge upstream #42

Conversation

l3utterfly commented Oct 10, 2024