Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ag indirect copy dest #292

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
6eeaeba
cmake: use 1 more thread for non-ggml in CI (#8740)
JohannesGaessler Jul 28, 2024
0832de7
[SYCL] add conv support (#8688)
airMeng Jul 29, 2024
439b3fc
cuda : organize vendor-specific headers into vendors directory (#8746)
yeahdongcn Jul 29, 2024
75af08c
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…
CarterLi999 Jul 29, 2024
c887d8b
[SYCL] Add `TIMESTEP_EMBEDDING` OP (#8707)
zhentaoyu Jul 30, 2024
6e2b600
cann: update cmake (#8765)
wangshuai09 Jul 30, 2024
140074b
flake.lock: Update (#8729)
ggerganov Jul 30, 2024
7c27a19
added android implementation of ggml_print_backtrace_symbols (#8751)
l3utterfly Jul 30, 2024
7e72aa7
py: add_array() will not add to kv store if value is an empty array (…
mofosyne Jul 30, 2024
268c566
nix: cuda: rely on propagatedBuildInputs (#8772)
SomeoneSerge Jul 30, 2024
44d28dd
cmake : fix use of external ggml (#8787)
iboB Jul 31, 2024
398ede5
Adding Gemma 2 2B configs (#8784)
pculliton Jul 31, 2024
ed9d285
Build: Fix potential race condition (#8781)
HanClinto Jul 31, 2024
afbbcf3
server : update llama-server embedding flag documentation (#8779)
okigan Jul 31, 2024
c8a0090
cann: support q8_0 for Ascend backend (#8805)
wangshuai09 Aug 1, 2024
7a11eb3
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800)
slaren Aug 1, 2024
b7a08fd
Build: Only include execinfo.h on linux systems that support it (#8783)
acon96 Aug 1, 2024
afbb4c1
ggml-cuda: Adding support for unified memory (#8035)
matteoserva Aug 1, 2024
0fbbd88
[SYCL] Fixing wrong VDR iq4nl value (#8812)
OuadiElfarouki Aug 2, 2024
e09a800
cann: Fix ggml_cann_im2col for 1D im2col (#8819)
MengqingCao Aug 2, 2024
b72c20b
Fix conversion of unnormalized BF16->BF16 weights (#7843)
CISC Aug 2, 2024
76614f3
ggml : reading the runtime sve config of the cpu (#8709)
jdomke Aug 3, 2024
4b77ea9
flake.lock: Update (#8847)
ggerganov Aug 4, 2024
01aae2b
baby-llama : remove duplicate vector include
danbev Aug 3, 2024
ecf6b7f
batched-bench : handle empty `-npl` (#8839)
cunnie Aug 4, 2024
978ba3d
Server: Don't ignore llama.cpp params (#8754)
ardfork Aug 4, 2024
0d6fb52
Install curl in runtime layer (#8693)
bsquizz Aug 4, 2024
c02b0a8
cann: support q4_0 model (#8822)
wangshuai09 Aug 5, 2024
655858a
ggml : move c parameter comment to ggml_rope_ext (ggml/901)
danbev Jul 29, 2024
a3738b2
vulkan : implement Stable Diffusion operators (ggml/904)
0cc4m Aug 4, 2024
5587e57
sync : ggml
ggerganov Aug 4, 2024
064cdc2
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)
0cc4m Aug 5, 2024
f1ea514
llama : better replace_all (#8852)
ggerganov Aug 5, 2024
400ae6f
readme : update model list (#8851)
BarfingLemurs Aug 5, 2024
e31a4f6
cmake: fix paths for vulkan shaders compilation on Windows (#8573)
stduhpf Aug 5, 2024
d3f0c71
Stop the generation when <|eom_id|> token is encountered - needed for…
fairydreaming Aug 5, 2024
1ef14b3
py: Add more authorship metadata from model card (#8810)
mofosyne Aug 5, 2024
b9dfc25
ggml : fix overflows in elu function (#8866)
jart Aug 5, 2024
b42978e
readme : add ramalama to the availables UI (#8811)
ericcurtin Aug 5, 2024
bc0f887
cann: fix buffer_num and runtime speed slowly error (#8865)
wangshuai09 Aug 5, 2024
0a4ce78
common : Changed tuple to struct (TODO fix) (#8823)
Septa2112 Aug 5, 2024
d4ff847
[SYCL] correct cmd name (#8877)
arthw Aug 6, 2024
c21a896
[CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871)
MengqingCao Aug 6, 2024
cdd1889
convert : add support for XLMRoberta embedding models (#8658)
iamlemec Aug 6, 2024
2d5dd7b
ggml : add epsilon as a parameter for group_norm (#8818)
MollySophia Aug 6, 2024
0bf16de
contributing : add note about write access
ggerganov Aug 6, 2024
efda90c
[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `…
MaggotHATE Aug 6, 2024
db20f50
cmake : Link vulkan-shaders-gen with pthreads (#8835)
Patater Aug 6, 2024
5f4dcb1
simple : update name of executable to llama-simple (#8885)
danbev Aug 6, 2024
641f5dd
CUDA: fix padding logic for FP16/FP32 (#8884)
JohannesGaessler Aug 6, 2024
1e6f655
server : add lora hotswap endpoint (WIP) (#8857)
ngxson Aug 6, 2024
3195854
typo correction (#8891)
Nexesenex Aug 6, 2024
725e3d9
quantize : update usage comment in quantize.cpp (#8889)
danbev Aug 6, 2024
506122d
llama-bench : add support for getting cpu info on Windows (#8824)
kylo5aby Aug 7, 2024
a8dbc6f
CUDA/HIP: fix tests/test-backend-ops (#8896)
JohannesGaessler Aug 7, 2024
0478174
[SYCL] Updated SYCL device filtering (#8901)
OuadiElfarouki Aug 7, 2024
be55695
ggml-backend : fix async copy from CPU (#8897)
slaren Aug 7, 2024
15fa07a
make : use C compiler to build metal embed object (#8899)
slaren Aug 7, 2024
ebd541a
make : clean llamafile objects (#8923)
DrDub Aug 8, 2024
85fca8d
metal : add abort callback (ggml/905)
conradev Aug 7, 2024
5b33ea1
metal : fix struct name (ggml/912)
ggerganov Aug 7, 2024
f93d49a
ggml : ignore more msvc warnings (ggml/906)
iboB Aug 7, 2024
e44a561
sync : ggml
ggerganov Aug 8, 2024
366d486
scripts : fix sync filenames (#0)
ggerganov Aug 8, 2024
afd27f0
scripts : sync cann files (#0)
ggerganov Aug 8, 2024
3a14e00
gguf-py : simplify support for quant types (#8838)
compilade Aug 8, 2024
345a686
llama : reduce useless copies when saving session (#8916)
compilade Aug 9, 2024
daef3ab
server : add one level list nesting for embeddings (#8936)
gelim Aug 9, 2024
6f6496b
llama : fix typo in llama_tensor_get_type comment [no ci] (#8937)
danbev Aug 9, 2024
5b2c04f
embedding : add --pooling option to README.md [no ci] (#8934)
danbev Aug 9, 2024
70c0ea3
whisper : use vulkan as gpu backend when available (whisper/2302)
mstephenson6 Jul 16, 2024
4305b57
sync : ggml
ggerganov Aug 9, 2024
3071c0a
llava : support MiniCPM-V-2.5 (#7599)
tc-mb Aug 9, 2024
45a55b9
llama : better replace_all (cont) (#8926)
ggerganov Aug 9, 2024
272e3bd
make : fix llava obj file race (#8946)
ggerganov Aug 9, 2024
6afd1a9
llama : add support for lora adapters in T5 model (#8938)
fairydreaming Aug 9, 2024
b72942f
Merge commit from fork
ggerganov Aug 9, 2024
911b437
gguf-py : fix double call to add_architecture() (#8952)
tarilabs Aug 10, 2024
7c3f55c
Add support for encoder-only T5 models (#8900)
fairydreaming Aug 10, 2024
7eb2384
llama : default n_swa for phi-3 (#8931)
ngxson Aug 10, 2024
6e02327
metal : fix uninitialized abort_callback (#8968)
slaren Aug 10, 2024
7c5bfd5
Optimize Vulkan backend for better CPU performance and less GPU synch…
mtavenrath Aug 11, 2024
33309f6
llama : check all graph nodes when searching for result_embd_pooled (…
fairydreaming Aug 11, 2024
a21c6fd
update guide (#8909)
arthw Aug 11, 2024
8cd1bcf
flake.lock: Update (#8979)
ggerganov Aug 11, 2024
4134999
gguf-py : Numpy dequantization for most types (#8939)
compilade Aug 11, 2024
5ef07e2
server : handle models with missing EOS token (#8997)
ggerganov Aug 12, 2024
d3ae0ee
py : fix requirements check '==' -> '~=' (#8982)
ggerganov Aug 12, 2024
2589292
Fix a spelling mistake (#9001)
Septa2112 Aug 12, 2024
df5478f
ggml: fix div-by-zero (#9003)
DavidKorczynski Aug 12, 2024
1262e7e
grammar-parser : fix possible null-deref (#9004)
DavidKorczynski Aug 12, 2024
84eb2f4
docs: introduce gpustack and gguf-parser (#8873)
thxCode Aug 12, 2024
0fd93cd
llama : model-based max number of graph nodes calculation (#8970)
nicoboss Aug 12, 2024
a3d48e4
Simplify and improve CUDA graphs through use of indirect copy pointers
agray3 Aug 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .devops/llama-server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential git libcurl4-openssl-dev curl
apt-get install -y build-essential git libcurl4-openssl-dev

WORKDIR /app

Expand All @@ -16,7 +16,7 @@ RUN make -j$(nproc) llama-server
FROM ubuntu:$UBUNTU_VERSION AS runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/llama-server /llama-server

Expand Down
13 changes: 3 additions & 10 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -126,16 +126,9 @@ let
++ optionals useMetalKit [ MetalKit ];

cudaBuildInputs = with cudaPackages; [
cuda_cccl.dev # <nv/target>

# A temporary hack for reducing the closure size, remove once cudaPackages
# have stopped using lndir: https://github.com/NixOS/nixpkgs/issues/271792
cuda_cudart.dev
cuda_cudart.lib
cuda_cudart.static
libcublas.dev
libcublas.lib
libcublas.static
cuda_cudart
cuda_cccl # <nv/target>
libcublas
];

rocmBuildInputs = with rocmPackages; [
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -860,7 +860,8 @@ jobs:
mkdir build
cd build
cmake .. -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release -j $((${env:NUMBER_OF_PROCESSORS} - 1))
cmake --build . --config Release -j $((${env:NUMBER_OF_PROCESSORS} - 1)) -t ggml
cmake --build . --config Release -j ${env:NUMBER_OF_PROCESSORS}

- name: Determine tag name
id: tag
Expand Down
6 changes: 2 additions & 4 deletions .github/workflows/python-check-requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,13 @@ on:
- '.github/workflows/python-check-requirements.yml'
- 'scripts/check-requirements.sh'
- 'convert*.py'
- 'requirements.txt'
- 'requirements/*.txt'
- '**/requirements*.txt'
pull_request:
paths:
- '.github/workflows/python-check-requirements.yml'
- 'scripts/check-requirements.sh'
- 'convert*.py'
- 'requirements.txt'
- 'requirements/*.txt'
- '**/requirements*.txt'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ models-mnt
!models/ggml-vocab-*.gguf*

# Zig

zig-out/
zig-cache/

Expand Down
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,8 @@ set(LLAMA_BIN_INSTALL_DIR ${CMAKE_INSTALL_BINDIR} CACHE PATH "Location o
# determining _precisely_ which defines are necessary for the llama-config
# package.
#
get_directory_property(GGML_DIR_DEFINES DIRECTORY ggml/src COMPILE_DEFINITIONS)
get_target_property(GGML_DIRECTORY ggml SOURCE_DIR)
get_directory_property(GGML_DIR_DEFINES DIRECTORY ${GGML_DIRECTORY} COMPILE_DEFINITIONS)
get_target_property(GGML_TARGET_DEFINES ggml COMPILE_DEFINITIONS)
set(GGML_TRANSIENT_DEFINES ${GGML_TARGET_DEFINES} ${GGML_DIR_DEFINES})
get_target_property(GGML_LINK_LIBRARIES ggml LINK_LIBRARIES)
Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- Execute [the full CI locally on your machine](ci/README.md) before publishing
- Please rate the complexity of your PR (i.e. `Review Complexity : Low`, `Review Complexity : Medium`, `Review Complexity : High`). This makes it easier for maintainers to triage the PRs.
- The PR template has a series of review complexity checkboxes `[ ]` that [you can mark as](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists) `[X]` for your convenience
- Consider allowing write access to your branch for faster review
- If your PR becomes stale, don't hesitate to ping the maintainers in the comments

# Pull requests (for collaborators)
Expand Down
67 changes: 37 additions & 30 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ BUILD_TARGETS = \
llama-imatrix \
llama-infill \
llama-llava-cli \
llama-minicpmv-cli\
llama-lookahead \
llama-lookup \
llama-lookup-create \
Expand Down Expand Up @@ -888,15 +889,16 @@ ggml/src/ggml-metal-embed.o: \
ggml/src/ggml-common.h
@echo "Embedding Metal library"
@sed -e '/#include "ggml-common.h"/r ggml/src/ggml-common.h' -e '/#include "ggml-common.h"/d' < ggml/src/ggml-metal.metal > ggml/src/ggml-metal-embed.metal
$(eval TEMP_ASSEMBLY=$(shell mktemp))
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)
@echo ".incbin \"ggml/src/ggml-metal-embed.metal\"" >> $(TEMP_ASSEMBLY)
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)
@$(AS) $(TEMP_ASSEMBLY) -o $@
@rm -f ${TEMP_ASSEMBLY}
$(eval TEMP_ASSEMBLY=$(shell mktemp -d))
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".incbin \"ggml/src/ggml-metal-embed.metal\"" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
$(CC) $(CFLAGS) -c $(TEMP_ASSEMBLY)/ggml-metal-embed.s -o $@
@rm -f ${TEMP_ASSEMBLY}/ggml-metal-embed.s
@rmdir ${TEMP_ASSEMBLY}
endif
endif # GGML_METAL

Expand Down Expand Up @@ -1205,6 +1207,7 @@ clean:
rm -rvf ggml/*.dll
rm -rvf ggml/*.so
rm -vrf ggml/src/*.o
rm -rvf ggml/src/llamafile/*.o
rm -rvf common/build-info.cpp
rm -vrf ggml/src/ggml-metal-embed.metal
rm -vrf ggml/src/ggml-cuda/*.o
Expand Down Expand Up @@ -1451,15 +1454,20 @@ libllava.a: examples/llava/llava.cpp \
$(CXX) $(CXXFLAGS) -static -fPIC -c $< -o $@ -Wno-cast-qual

llama-llava-cli: examples/llava/llava-cli.cpp \
examples/llava/clip.h \
examples/llava/clip.cpp \
examples/llava/llava.cpp \
examples/llava/llava.h \
examples/llava/clip.cpp \
examples/llava/clip.h \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual

llama-minicpmv-cli: examples/llava/minicpmv-cli.cpp \
examples/llava/llava.cpp \
examples/llava/llava.h \
examples/llava/clip.cpp \
examples/llava/clip.h \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) -c examples/llava/clip.cpp -o $(call GET_OBJ_FILE, examples/llava/clip.cpp) -Wno-cast-qual
$(CXX) $(CXXFLAGS) -c examples/llava/llava.cpp -o $(call GET_OBJ_FILE, examples/llava/llava.cpp)
$(CXX) $(CXXFLAGS) $(filter-out %.h $< examples/llava/clip.cpp examples/llava/llava.cpp,$^) $(call GET_OBJ_FILE, $<) $(call GET_OBJ_FILE, examples/llava/clip.cpp) $(call GET_OBJ_FILE, examples/llava/llava.cpp) -o $@ $(LDFLAGS)
$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual

ifeq ($(UNAME_S),Darwin)
swift: examples/batched.swift
Expand Down Expand Up @@ -1605,42 +1613,41 @@ llama-q8dot: pocs/vdot/q8dot.cpp ggml/src/ggml.o \
# Mark legacy binary targets as .PHONY so that they are always checked.
.PHONY: main quantize perplexity embedding server

# Define the object file target
examples/deprecation-warning/deprecation-warning.o: examples/deprecation-warning/deprecation-warning.cpp
$(CXX) $(CXXFLAGS) -c $< -o $@

# NOTE: We currently will always build the deprecation-warning `main` and `server` binaries to help users migrate.
# Eventually we will want to remove these target from building all the time.
main: examples/deprecation-warning/deprecation-warning.cpp
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
main: examples/deprecation-warning/deprecation-warning.o
$(CXX) $(CXXFLAGS) $< -o $@ $(LDFLAGS)
@echo "NOTICE: The 'main' binary is deprecated. Please use 'llama-cli' instead."

server: examples/deprecation-warning/deprecation-warning.cpp
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
server: examples/deprecation-warning/deprecation-warning.o
$(CXX) $(CXXFLAGS) $< -o $@ $(LDFLAGS)
@echo "NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead."

quantize: examples/deprecation-warning/deprecation-warning.cpp
quantize: examples/deprecation-warning/deprecation-warning.o
ifneq (,$(wildcard quantize))
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
$(CXX) $(CXXFLAGS) $< -o $@ $(LDFLAGS)
@echo "#########"
@echo "WARNING: The 'quantize' binary is deprecated. Please use 'llama-quantize' instead."
@echo " Remove the 'quantize' binary to remove this warning."
@echo "#########"
endif

perplexity: examples/deprecation-warning/deprecation-warning.cpp
perplexity: examples/deprecation-warning/deprecation-warning.o
ifneq (,$(wildcard perplexity))
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
$(CXX) $(CXXFLAGS) $< -o $@ $(LDFLAGS)
@echo "#########"
@echo "WARNING: The 'perplexity' binary is deprecated. Please use 'llama-perplexity' instead."
@echo " Remove the 'perplexity' binary to remove this warning."
@echo "#########"
endif

embedding: examples/deprecation-warning/deprecation-warning.cpp
embedding: examples/deprecation-warning/deprecation-warning.o
ifneq (,$(wildcard embedding))
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
$(CXX) $(CXXFLAGS) $< -o $@ $(LDFLAGS)
@echo "#########"
@echo "WARNING: The 'embedding' binary is deprecated. Please use 'llama-embedding' instead."
@echo " Remove the 'embedding' binary to remove this warning."
Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,16 @@ Typically finetunes of the base models below are supported as well.
- [x] [SEA-LION](https://huggingface.co/models?search=sea-lion)
- [x] [GritLM-7B](https://huggingface.co/GritLM/GritLM-7B) + [GritLM-8x7B](https://huggingface.co/GritLM/GritLM-8x7B)
- [x] [OLMo](https://allenai.org/olmo)
- [x] [Granite models](https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330)
- [x] [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) + [Pythia](https://github.com/EleutherAI/pythia)
- [x] [Snowflake-Arctic MoE](https://huggingface.co/collections/Snowflake/arctic-66290090abe542894a5ac520)
- [x] [Smaug](https://huggingface.co/models?search=Smaug)
- [x] [Poro 34B](https://huggingface.co/LumiOpen/Poro-34B)
- [x] [Bitnet b1.58 models](https://huggingface.co/1bitLLM)
- [x] [Flan T5](https://huggingface.co/models?search=flan-t5)
- [x] [Open Elm models](https://huggingface.co/collections/apple/openelm-instruct-models-6619ad295d7ae9f868b759ca)
- [x] [ChatGLM3-6b](https://huggingface.co/THUDM/chatglm3-6b) + [ChatGLM4-9b](https://huggingface.co/THUDM/glm-4-9b)
- [x] [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)

(instructions for supporting more models: [HOWTO-add-model.md](./docs/development/HOWTO-add-model.md))

Expand Down Expand Up @@ -145,6 +153,7 @@ Unless otherwise noted these projects are open-source with permissive licensing:
- [Faraday](https://faraday.dev/) (proprietary)
- [LMStudio](https://lmstudio.ai/) (proprietary)
- [Layla](https://play.google.com/store/apps/details?id=com.laylalite) (proprietary)
- [ramalama](https://github.com/containers/ramalama) (MIT)
- [LocalAI](https://github.com/mudler/LocalAI) (MIT)
- [LostRuins/koboldcpp](https://github.com/LostRuins/koboldcpp) (AGPL)
- [Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile)
Expand Down Expand Up @@ -177,10 +186,12 @@ Unless otherwise noted these projects are open-source with permissive licensing:

- [akx/ggify](https://github.com/akx/ggify) – download PyTorch models from HuggingFace Hub and convert them to GGML
- [crashr/gppm](https://github.com/crashr/gppm) – launch llama.cpp instances utilizing NVIDIA Tesla P40 or P100 GPUs with reduced idle power consumption
- [gpustack/gguf-parser](https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) - review/check the GGUF file and estimate the memory usage

**Infrastructure:**

- [Paddler](https://github.com/distantmagic/paddler) - Stateful load balancer custom-tailored for llama.cpp
- [GPUStack](https://github.com/gpustack/gpustack) - Manage GPU clusters for running LLMs

**Games:**
- [Lucy's Labyrinth](https://github.com/MorganRO8/Lucys_Labyrinth) - A simple maze game where agents controlled by an AI model will try to trick you.
Expand Down
Loading
Loading