Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b2532 #102

Merged
Merged

b2532 #102

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
47cc7a7
Server: Handle n_keep parameter in the request (#6174)
jkarthic Mar 20, 2024
f8c4e74
llava : add a MobileVLM_V2-1.7B backup (#6152)
ZiangWu-77 Mar 20, 2024
d795988
Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"
ggerganov Mar 20, 2024
bc0baab
server : allow to override -ngl in tests (#6170)
ggerganov Mar 20, 2024
6b7e76d
gitignore : ignore curl-related files
ggerganov Mar 20, 2024
91f8ad1
Server: version bump for httplib and json (#6169)
ngxson Mar 20, 2024
ccf58aa
cuda : refactor to remove global resources (#6170)
slaren Mar 20, 2024
272935b
llava : add MobileVLM_V2 backup (#6175)
ZiangWu-77 Mar 20, 2024
f9c7ba3
llava : update MobileVLM-README.md (#6180)
ZiangWu-77 Mar 20, 2024
1c51f98
cuda : print the returned error when CUDA initialization fails (#6185)
slaren Mar 20, 2024
42e21c6
cuda : fix conflict with std::swap (#6186)
slaren Mar 21, 2024
c5b8595
Add nvidia and amd backends (#6157)
AidanBeltonS Mar 21, 2024
76aa30a
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
ikawrakow Mar 21, 2024
5e43ba8
build : add mac pre-build binaries (#6182)
Vaibhavs10 Mar 21, 2024
1943c01
ci : fix indentation error (#6195)
Vaibhavs10 Mar 21, 2024
5b7b0ac
json-schema-to-grammar improvements (+ added to server) (#5978)
ochafik Mar 21, 2024
cfd3be7
ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)
ikawrakow Mar 21, 2024
03a8f8f
cuda : fix LLAMA_CUDA_F16 build (#6197)
slaren Mar 21, 2024
924ce1d
tests : disable system() calls (#6198)
ggerganov Mar 21, 2024
f372c49
Corrected typo to wrong file (#6199)
semidark Mar 21, 2024
d0a7123
cuda : disable host register by default (#6206)
slaren Mar 21, 2024
be07a03
server : update readme doc from `slot_id` to `id_slot` (#6213)
kaetemi Mar 21, 2024
fa046ea
Fix params underscore convert to dash. (#6203)
dranger003 Mar 22, 2024
59c17f0
add blog link (#6222)
NeoZhangJianyu Mar 22, 2024
95d576b
metal : pad n_ctx by 32 (#6177)
ggerganov Mar 22, 2024
b2075fd
ci : add CURL flag for the mac builds (#6214)
Vaibhavs10 Mar 22, 2024
b3e94f2
metal : proper assert for mat-mat memory alignment (#6225)
ggerganov Mar 22, 2024
68e210b
server : enable continuous batching by default (#6231)
ggerganov Mar 22, 2024
6b8bb3a
server : fix n_keep always showing as 0 in response (#6211)
kaetemi Mar 22, 2024
29ab270
readme : add RecurseChat to the list of UIs (#6219)
xyc Mar 22, 2024
2f0e81e
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy…
slaren Mar 22, 2024
72114ed
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
ochafik Mar 22, 2024
f77a8ff
tests : conditional python & node json schema tests (#6207)
ochafik Mar 22, 2024
e80f06d
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
Nexesenex Mar 22, 2024
80bd33b
common : add HF arg helpers (#6234)
ggerganov Mar 22, 2024
ee804f6
ci: apply concurrency limit for github workflows (#6243)
mscheong01 Mar 22, 2024
dba1af6
llama_model_loader: support multiple split/shard GGUFs (#6187)
phymbert Mar 22, 2024
1d0331c
quantize: options for output and token embedding tensors qtype (#6239)
ikawrakow Mar 22, 2024
92397d8
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
fraxy-v Mar 22, 2024
56a00f0
common : default --hf-file to --model (#6234)
ggerganov Mar 22, 2024
50ccaf5
lookup: complement data from context with general text statistics (#5…
JohannesGaessler Mar 23, 2024
1b26aeb
server: flush stdout after logging in both text and json layout (#6253)
phymbert Mar 23, 2024
21cad01
split: add gguf-split in the make build target (#6262)
phymbert Mar 23, 2024
476b025
llama : add grok-1 support (#6204)
arki05 Mar 23, 2024
1997577
server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-di…
phymbert Mar 23, 2024
f482bb2
common: llama_load_model_from_url split support (#6192)
phymbert Mar 23, 2024
9556217
gitignore : gguf-split
ggerganov Mar 23, 2024
94d1b3b
use _wfopen instead of fopen on Windows (#6248)
cebtenzzre Mar 23, 2024
d03224a
Support build win release for SYCL (#6241)
NeoZhangJianyu Mar 24, 2024
ddf6568
[SYCL] offload op (#6217)
airMeng Mar 24, 2024
586e7bc
sampling : deduplicated code for probability distribution access (#6240)
mscheong01 Mar 24, 2024
ea279d5
ci : close inactive issue, increase operations per run (#6270)
phymbert Mar 24, 2024
7aed0ff
Fixed lookup compilation issues on Windows (#6273)
JohannesGaessler Mar 24, 2024
a0e584d
imatrix : fix wname for mul_mat_id ops (#6271)
ggerganov Mar 24, 2024
a32b77c
Fix heap corruption from wmode out-of-bound writes on windows (#6272)
TheFlipbook Mar 24, 2024
7733f0c
ggml : support AVX512VNNI (#6280)
jart Mar 25, 2024
64e7b47
examples : add "retrieval" (#6193)
mscheong01 Mar 25, 2024
95ad616
[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)
NeoZhangJianyu Mar 25, 2024
ad3a050
Server: clean up OAI params parsing function (#6284)
ngxson Mar 25, 2024
ae1f211
cuda : refactor into multiple files (#6269)
slaren Mar 25, 2024
2f34b86
cuda : fix LLAMA_CUDA_F16 build (#6298)
slaren Mar 25, 2024
43139cc
flake.lock: Update (#6266)
ggerganov Mar 25, 2024
1f2fd4e
tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)
ikawrakow Mar 25, 2024
b06c16e
nix: fix blas support (#6281)
ck3d Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Checks: >
-readability-implicit-bool-conversion,
-readability-magic-numbers,
-readability-uppercase-literal-suffix,
-readability-simplify-boolean-expr,
clang-analyzer-*,
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
performance-*,
Expand Down
6 changes: 3 additions & 3 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
git,
python3,
mpi,
openblas, # TODO: Use the generic `blas` so users could switch between alternative implementations
blas,
cudaPackages,
darwin,
rocmPackages,
Expand Down Expand Up @@ -181,6 +181,7 @@ effectiveStdenv.mkDerivation (
++ optionals useMpi [ mpi ]
++ optionals useOpenCL [ clblast ]
++ optionals useRocm rocmBuildInputs
++ optionals useBlas [ blas ]
++ optionals useVulkan vulkanBuildInputs;

cmakeFlags =
Expand Down Expand Up @@ -216,8 +217,7 @@ effectiveStdenv.mkDerivation (
# Should likely use `rocmPackages.clr.gpuTargets`.
"-DAMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
]
++ optionals useMetalKit [ (lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1") ]
++ optionals useBlas [ (lib.cmakeFeature "LLAMA_BLAS_VENDOR" "OpenBLAS") ];
++ optionals useMetalKit [ (lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1") ];

# TODO(SomeoneSerge): It's better to add proper install targets at the CMake level,
# if they haven't been added yet.
Expand Down
168 changes: 167 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,133 @@ on:
types: [opened, synchronize, reopened]
paths: ['**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m']

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
GGML_NLOOP: 3
GGML_N_THREADS: 1

jobs:
macOS-latest-cmake-arm64:
runs-on: macos-14

steps:
- name: Clone
id: checkout
uses: actions/checkout@v3

- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update

- name: Build
id: cmake_build
run: |
sysctl -a
mkdir build
cd build
cmake -DLLAMA_FATAL_WARNINGS=ON -DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)

- name: Test
id: cmake_test
run: |
cd build
ctest -L main --verbose --timeout 900

- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
else
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
fi

- name: Pack artifacts
id: pack_artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
run: |
cp LICENSE ./build/bin/
zip -r llama-${{ steps.tag.outputs.name }}-bin-macos-arm64.zip ./build/bin/*

- name: Upload artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
uses: actions/upload-artifact@v3
with:
path: |
llama-${{ steps.tag.outputs.name }}-bin-macos-arm64.zip

macOS-latest-cmake-x64:
runs-on: macos-latest

steps:
- name: Clone
id: checkout
uses: actions/checkout@v3

- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update

- name: Build
id: cmake_build
run: |
sysctl -a
mkdir build
cd build
cmake -DLLAMA_FATAL_WARNINGS=ON -DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)

- name: Test
id: cmake_test
run: |
cd build
ctest -L main --verbose --timeout 900

- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
else
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
fi

- name: Pack artifacts
id: pack_artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
run: |
cp LICENSE ./build/bin/
zip -r llama-${{ steps.tag.outputs.name }}-bin-macos-x64.zip ./build/bin/*

- name: Upload artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
uses: actions/upload-artifact@v3
with:
path: |
llama-${{ steps.tag.outputs.name }}-bin-macos-x64.zip

ubuntu-focal-make:
runs-on: ubuntu-20.04
env:
LLAMA_NODE_AVAILABLE: true
LLAMA_PYTHON_AVAILABLE: true

steps:
- name: Clone
Expand All @@ -35,6 +154,14 @@ jobs:
sudo apt-get update
sudo apt-get install build-essential gcc-8

- uses: actions/setup-node@v4
with:
node-version: "20"

- uses: actions/setup-python@v4
with:
python-version: "3.11"

- name: Build
id: make_build
env:
Expand Down Expand Up @@ -98,6 +225,17 @@ jobs:
cd build
ctest -L main --verbose --timeout 900

- name: Test llama2c conversion
id: llama2c_test
run: |
cd build
echo "Fetch tokenizer"
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/tok512.bin
echo "Fetch llama2c model"
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
./bin/convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
./bin/main -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256

# ubuntu-latest-cmake-sanitizer:
# runs-on: ubuntu-latest
#
Expand Down Expand Up @@ -662,6 +800,7 @@ jobs:

windows-latest-cmake-sycl:
runs-on: windows-latest

defaults:
run:
shell: bash
Expand All @@ -670,7 +809,6 @@ jobs:
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/62641e01-1e8d-4ace-91d6-ae03f7f8a71f/w_BaseKit_p_2024.0.0.49563_offline.exe
WINDOWS_DPCPP_MKL: intel.oneapi.win.cpp-dpcpp-common:intel.oneapi.win.mkl.devel


steps:
- name: Clone
id: checkout
Expand All @@ -685,6 +823,32 @@ jobs:
id: cmake_build
run: examples/sycl/win-build-sycl.bat

- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
else
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
fi

- name: Pack artifacts
id: pack_artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
run: |
7z a llama-${{ steps.tag.outputs.name }}-bin-win-sycl-x64.zip .\build\bin\*

- name: Upload artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
uses: actions/upload-artifact@v3
with:
path: |
llama-${{ steps.tag.outputs.name }}-bin-win-sycl-x64.zip

ios-xcode-build:
runs-on: macos-latest

Expand Down Expand Up @@ -748,6 +912,8 @@ jobs:
- macOS-latest-cmake
- windows-latest-cmake
- windows-latest-cmake-cublas
- macOS-latest-cmake-arm64
- macOS-latest-cmake-x64

steps:
- name: Clone
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/close-issue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ jobs:
close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale."
days-before-pr-stale: -1
days-before-pr-close: -1
operations-per-run: 1000
operations-per-run: 10000
repo-token: ${{ secrets.GITHUB_TOKEN }}
4 changes: 4 additions & 0 deletions .github/workflows/code-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ env:
GGML_NLOOP: 3
GGML_N_THREADS: 1

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
run:
runs-on: ubuntu-20.04
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ on:
branches:
- master

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
push_to_registry:
name: Push Docker image to Docker Hub
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/editorconfig.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ on:
branches:
- master

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
editorconfig:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/nix-ci-aarch64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ on:
types: [opened, synchronize, reopened]
paths: ['**/*.nix', 'flake.lock']

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
nix-build-aarch64:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/nix-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
pull_request:
types: [opened, synchronize, reopened]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
nix-eval:
strategy:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/python-check-requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ on:
- 'requirements.txt'
- 'requirements/*.txt'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
python-check-requirements:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/python-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ name: flake8 Lint

on: [push, pull_request]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
flake8-lint:
runs-on: ubuntu-latest
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ on:
schedule:
- cron: '0 0 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
server:
runs-on: ubuntu-latest
Expand All @@ -31,7 +35,6 @@ jobs:
include:
- build_type: Release
sanitizer: ""
disabled_on_pr: true
fail-fast: false # While -DLLAMA_SANITIZE_THREAD=ON is broken

container:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/zig-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
branches:
- master

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
build:
strategy:
Expand Down
Loading
Loading