b2586 #103

Nexesenex · 2024-04-03T06:39:39Z

No description provided.

* Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <[email protected]> * clip : fix whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>

This reverts commit f8c4e74.

* server: version bump for httplib and json * fix build * bring back content_length

* cuda : refactor to remove global resources

* Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <[email protected]> * clip : fix whitespace * fix deifinition mistake in clip.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>

* k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <[email protected]>

* Initial commit - add mac prebuilds. * forward contribution credits for building the workflow. * minor : remove trailing whitespaces --------- Co-authored-by: Nicolas Patry <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* json: fix arrays (disallow `[,1]`) * json: support tuple types (`[number, string]`) * json: support additionalProperties (`{[k: string]: [string,number][]}`) * json: support required / optional properties * json: add support for pattern * json: resolve $ref (and support https schema urls) * json: fix $ref resolution * join: support union types (mostly for nullable types I think) * json: support allOf + nested anyOf * json: support any (`{}` or `{type: object}`) * json: fix merge * json: temp fix for escapes * json: spaces in output and unrestricted output spaces * json: add typings * json:fix typo * Create ts-type-to-grammar.sh * json: fix _format_literal (json.dumps already escapes quotes) * json: merge lit sequences and handle negatives {"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"} * json: handle pattern repetitions * Update json-schema-to-grammar.mjs * Create regex-to-grammar.py * json: extract repeated regexp patterns to subrule * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * json: handle schema from pydantic Optional fields * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * Update ts-type-to-grammar.sh * Update ts-type-to-grammar.sh * json: simplify nullable fields handling * json: accept duplicate identical rules * json: revert space to 1 at most * json: reuse regexp pattern subrules * json: handle uuid string format * json: fix literal escapes * json: add --allow-fetch * json: simplify range escapes * json: support negative ranges in patterns * Delete commit.txt * json: custom regex parser, adds dot support & JS-portable * json: rm trailing spaces * Update json-schema-to-grammar.mjs * json: updated server & chat `( cd examples/server && ./deps.sh )` * json: port fixes from mjs to python * Update ts-type-to-grammar.sh * json: support prefixItems alongside array items * json: add date format + fix uuid * json: add date, time, date-time formats * json: preserve order of props from TS defs * json: port schema converter to C++, wire in ./server * json: nits * Update json-schema-to-grammar.cpp * Update json-schema-to-grammar.cpp * Update json-schema-to-grammar.cpp * json: fix mjs implementation + align outputs * Update json-schema-to-grammar.mjs.hpp * json: test C++, JS & Python versions * json: nits + regen deps * json: cleanup test * json: revert from c++17 to 11 * json: nit fixes * json: dirty include for test * json: fix zig build * json: pass static command to std::system in tests (fixed temp files) * json: fix top-level $refs * json: don't use c++20 designated initializers * nit * json: basic support for reserved names `{number:{number:{root:number}}}` * Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test) * json: re-ran server deps.sh * json: simplify test * json: support mix of additional props & required/optional * json: add tests for some expected failures * json: fix type=const in c++, add failure expectations for non-str const&enum * json: test (& simplify output of) empty schema * json: check parsing in test + fix value & string refs * json: add server tests for OAI JSON response_format * json: test/fix top-level anyOf * json: improve grammar parsing failures * json: test/fix additional props corner cases * json: fix string patterns (was missing quotes) * json: ws nit * json: fix json handling in server when there's no response_format * json: catch schema conversion errors in server * json: don't complain about unknown format type in server if unset * json: cleaner build of test * json: create examples/json-schema-pydantic-example.py * json: fix date pattern * json: move json.hpp & json-schema-to-grammar.{cpp,h} to common * json: indent 4 spaces * json: fix naming of top-level c++ function (+ drop unused one) * json: avoid using namespace std * json: fix zig build * Update server.feature * json: iostream -> fprintf * json: space before & refs for consistency * json: nits

* Make quantize_row_iq4_nl do the same thing is quantization on CUDA * Make quantize_row_iq4_nl do the same thing is quantization on CUDA This time for real. backend-ops tests pass. * Now fix test-quantize-fns --------- Co-authored-by: Iwan Kawrakow <[email protected]>

ggml-ci

The stated file `./devops/main-server.Dockerfile` does not exist. I figure that `.devops/server-intel.Dockerfile` was meant.

* Fix params underscore convert to dash. * Update common/common.cpp --------- Co-authored-by: slaren <[email protected]>

* metal : require ne00 >= 128 for mat-mat kernels ggml-ci * llama : pad n_ctx by 32 ggml-ci

* metal : proper assert for mat-mat memory alignment ggml-ci * readme : add notice about the bug fix * metal : fix the fix ggml-ci

Take all dependencies from the cross stage, rather tha only stdenv

…verridable

* doc: fix outdated default value of batch size * doc: add doc for ubatch-size

…nal repo (#6365)

* fix empty bug * Update MobileVLM-README.md added more results on devices * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update MobileVLM-README.md remove gguf links --------- Co-authored-by: Georgi Gerganov <[email protected]>

* Revisited & updated SYCL build documentation * removed outdated comment * Addressed PR comments * Trimed white spaces * added new end line

* Allow conversion of Mistral HF models * Homogenize Llama, Mistral, Mixtral under the same entry. * Fix tokenizer, permute tensors * Use sentencepiece tokenizer, or fall back to hfft. * convert-hf : small fix for mypy * convert-hf : fix duplicated block_count * convert-hf : add vocab size to metadata --------- Co-authored-by: Jared Van Bortel <[email protected]>

* llama: remove redundant reshape in build_kv_store This commit removes the reshape of the V matrix in the build_kv_store. The motivation for this is that V matrix has the shape: ```console (gdb) p *v_cur $46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_MUL_MAT, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x0, view_offs = 0, data = 0x0, name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` And after reshaping this tensor we get: ```console gdb) p *ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens) $44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_RESHAPE, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0, name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` I noticed that the `src` and `view_src` fields are different but that the dimensions are the same. From the code comment it seems like the reshape call is not needed and perhaps the above can motivate the removal of the reshape call. Signed-off-by: Daniel Bevenius <[email protected]> * llama : add assert --------- Signed-off-by: Daniel Bevenius <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* cmake: add explicit metal version options * Update CMakeLists.txt --------- Co-authored-by: Georgi Gerganov <[email protected]>

* readme: add Android UI binding * Update README.md

ggml-ci

* Support xverse model convert to gguf format. * 1. Convert xverse models to gguf; 2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md; * * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter * llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers. * - Fix format issues - Remove duplicate set kqv_out to llm_build_kv * Update llama.cpp --------- Co-authored-by: willhe <[email protected]> Co-authored-by: willhe <[email protected]>

* sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <[email protected]>

* Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning

* split by max size * clean up arg parse * split: ok * add dry run option * error on 0 tensors * be positive * remove next_metadata_size

* fixed deprecated address * fixed deprecated address * fixed deprecated address * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * reverted back to only the MIT license

)

* ci: server: verify deps are coherent with the commit * ci: server: change the ref to build as now it's a pull event target

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* disable iqx on windows as WA * array instead of global_memory

jkarthic and others added 30 commits March 20, 2024 12:02

Server: Handle n_keep parameter in the request (#6174)

47cc7a7

Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"

d795988

This reverts commit f8c4e74.

server : allow to override -ngl in tests (#6170)

bc0baab

gitignore : ignore curl-related files

6b7e76d

Server: version bump for httplib and json (#6169)

91f8ad1

* server: version bump for httplib and json * fix build * bring back content_length

cuda : refactor to remove global resources (#6170)

ccf58aa

* cuda : refactor to remove global resources

llava : update MobileVLM-README.md (#6180)

f9c7ba3

cuda : print the returned error when CUDA initialization fails (#6185)

1c51f98

cuda : fix conflict with std::swap (#6186)

42e21c6

Add nvidia and amd backends (#6157)

c5b8595

ci : fix indentation error (#6195)

1943c01

cuda : fix LLAMA_CUDA_F16 build (#6197)

03a8f8f

tests : disable system() calls (#6198)

924ce1d

ggml-ci

Corrected typo to wrong file (#6199)

f372c49

The stated file `./devops/main-server.Dockerfile` does not exist. I figure that `.devops/server-intel.Dockerfile` was meant.

cuda : disable host register by default (#6206)

d0a7123

server : update readme doc from slot_id to id_slot (#6213)

be07a03

Fix params underscore convert to dash. (#6203)

fa046ea

* Fix params underscore convert to dash. * Update common/common.cpp --------- Co-authored-by: slaren <[email protected]>

add blog link (#6222)

59c17f0

metal : pad n_ctx by 32 (#6177)

95d576b

* metal : require ne00 >= 128 for mat-mat kernels ggml-ci * llama : pad n_ctx by 32 ggml-ci

ci : add CURL flag for the mac builds (#6214)

b2075fd

metal : proper assert for mat-mat memory alignment (#6225)

b3e94f2

* metal : proper assert for mat-mat memory alignment ggml-ci * readme : add notice about the bug fix * metal : fix the fix ggml-ci

server : enable continuous batching by default (#6231)

68e210b

server : fix n_keep always showing as 0 in response (#6211)

6b8bb3a

readme : add RecurseChat to the list of UIs (#6219)

29ab270

SomeoneSerge and others added 29 commits March 28, 2024 07:48

nix: .#windows: proper cross-compilation set-up

e9f17dc

Take all dependencies from the cross stage, rather tha only stdenv

only using explicit blas if hostPlatform is allowed

dbb03e2

using blas.meta.available to check host platform

c873976

nix: moved blas availability check to package inputs so it is still o…

d39b308

…verridable

nix: removed unnessesary indentation

d2d8f38

server : stop gracefully on SIGTERM (#6348)

6902cb7

doc: fix outdated default value of batch size (#6336)

cfc4d75

* doc: fix outdated default value of batch size * doc: add doc for ubatch-size

ci: bench: fix master not schedule, fix commit status failed on exter…

28cb9a0

…nal repo (#6365)

llama : fix command-r inference when omitting outputs (#6367)

0308f5e

convert : refactor vocab selection logic (#6355)

be55134

[SYCL] Revisited & updated SYCL build documentation (#6141)

5106ef4

* Revisited & updated SYCL build documentation * removed outdated comment * Addressed PR comments * Trimed white spaces * added new end line

readme : add notice for UI list

bfe7daf

cmake : add explicit metal version options (#6370)

8093987

* cmake: add explicit metal version options * Update CMakeLists.txt --------- Co-authored-by: Georgi Gerganov <[email protected]>

readme : add project (#6356)

b910287

* readme: add Android UI binding * Update README.md

ci : fix BGE wget (#6383)

cfde806

ggml-ci

sync : ggml (#6351)

d48ccf3

* sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <[email protected]>

split: allow --split-max-size option (#6343)

f7fc5f6

* split by max size * clean up arg parse * split: ok * add dry run option * error on 0 tensors * be positive * remove next_metadata_size

ci: bench: fix Resource not accessible by integration on PR event (#6393

37e7854

)

readme : update hot topics

c50a82c

ci: server: verify deps are coherent with the commit (#6409)

226e819

* ci: server: verify deps are coherent with the commit * ci: server: change the ref to build as now it's a pull event target

compare-llama-bench.py: fix long hexsha args (#6424)

33a5244

[SYCL] Disable iqx on windows as WA (#6435)

5260486

* disable iqx on windows as WA * array instead of global_memory

Nexesenex merged commit ccf447b into Nexesenex:sidestream Apr 3, 2024
11 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b2586 #103

b2586 #103

Nexesenex commented Apr 3, 2024

b2586 #103

b2586 #103

Conversation

Nexesenex commented Apr 3, 2024