Add support for ggllm.cpp #3357

djmaze · 2023-07-28T23:53:28Z

Falcon is one of the very few good multilingual models. Support for the Falcon family models (7b / 40b) in text-generation-webui is currently very limited (4 bit only, bad performance) through AutoGPTQ. Also it needs at least 35 GB of VRAM.

ggllm.cpp is optimized for running quantized versions of those models and runs much faster. Also, it supports down to 2 bit quantized versions, which allows using the 40b model on a single 24 GB GPU.

Flanua · 2023-07-29T14:21:28Z

Just my two cents: I personally don't recommend to down anything to 2 bits lower than 8 bits. The AI model degrades significantly.

jllllll · 2023-07-30T01:11:14Z

Not technically a duplicate, but close enough to: #3351

CTransformers is what would be used if ggllm.cpp were to be integrated.

djmaze · 2023-07-30T09:41:32Z

Just my two cents: I personally don't recommend to down anything to 2 bits lower than 8 bits. The AI model degrades significantly.

Not sure I understood you correctly. I am using the 3-bit quantized version (GGML_TYPE_Q3_K) of the linked model with ggllm.cpp and AFAICS it works really well. Especially when using German, I did not find a better a better model yet.

jllllll · 2023-07-30T16:10:54Z

3bit is worth using only on the larger model sizes where it makes less of a difference. 30B+. It does lower output quality significantly, but is still worth using if it allows you to use a larger model. 2bit is pretty much useless though.

goodglitch · 2023-07-30T18:01:29Z

Just my two cents: I personally don't recommend to down anything to 2 bits lower than 8 bits. The AI model degrades significantly.

Not sure I understood you correctly. I am using the 3-bit quantized version (GGML_TYPE_Q3_K) of the linked model with ggllm.cpp and AFAICS it works really well. Especially when using German, I did not find a better a better model yet.

Try 2bit model with moderate difficulty coding task and you will see what he is talking about. Anything that requires precision is out of question. It is good that for your use case a lower quantization still produces output that makes sense.

oobabooga · 2023-07-30T20:27:51Z

ctransformers is on my radar, I'll merge one of the open PRs adding support soon. It's always a challenge to add new backends because they usually don't come with precompiled wheels.

jllllll · 2023-07-30T20:29:54Z

ctransformers is on my radar, I'll merge one of the open PRs adding support soon. It's always a challenge to add new backends because they usually don't come with precompiled wheels.

I'm currently in the process of building pre-compiled wheels for CUDA 11.7.
ctransformers has a wheel for CUDA 12.1, but that isn't particularly useful for us as we use CUDA 11.7.

Fortunately, ctransformers already handles CUDA and non-CUDA builds internally, so a separate package won't be needed like with llama-cpp-python.

oobabooga · 2023-07-30T21:03:04Z

That's very nice to hear @jllllll.

jllllll · 2023-07-30T21:18:22Z

That's very nice to hear @jllllll.

https://github.com/jllllll/ctransformers-cuBLAS-wheels/releases/download/AVX2/ctransformers-0.2.16+cu117-py3-none-any.whl

This wheel includes CUDA binaries for both Windows and Linux. MacOS is also supported through non-CUDA binaries.

oobabooga#3357

RDearnaley · 2023-08-24T22:10:58Z

+1 on this request, mostly for running Falcon 40B quantized in GGML (in my case, on Apple Silicon).

Not technically a duplicate, but close enough to: #3351

CTransformers is what would be used if ggllm.cpp were to be integrated.

I note #3351 and #3313 are now done, so does that mean that this is now working, or just that it's unblocked?

jllllll · 2023-08-25T00:32:47Z

ctransformers has been implemented as a loader, which includes ggllm.cpp.

It should work.

github-actions · 2023-10-06T23:16:11Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

augchan42 · 2023-12-05T01:46:12Z

Still doesn't work for me, I'm on Macbook Pro M2 Max (Apple Silicon):
pip3 install -r requirements.txt output:

Installing collected packages: ctransformers
Attempting uninstall: ctransformers
Found existing installation: ctransformers 0.2.27
Uninstalling ctransformers-0.2.27:
Successfully uninstalled ctransformers-0.2.27
Successfully installed ctransformers-0.2.27+cu121
(venv) auchan@Augustins-MBP text-generation-webui %

Then when trying to load the model after restarting the server:
2023-12-05 09:43:16 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/Users/auchan/projects/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/auchan/projects/text-generation-webui/modules/models.py", line 85, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/auchan/projects/text-generation-webui/modules/models.py", line 280, in ctransformers_loader
from modules.ctransformers_model import CtransformersModel
File "/Users/auchan/projects/text-generation-webui/modules/ctransformers_model.py", line 1, in
from ctransformers import AutoConfig, AutoModelForCausalLM
ModuleNotFoundError: No module named 'ctransformers'

augchan42 · 2023-12-05T01:50:42Z

Might be because it's tied to AVX2:
Collecting ctransformers==0.2.27+cu121 (from -r requirements.txt (line 88))
Downloading https://github.com/jllllll/ctransformers-cuBLAS-wheels/releases/download/AVX2/ctransformers-0.2.27+cu121-py3-none-any.whl (15.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.5/15.5 MB 12.2 MB/s eta 0:00:00

djmaze added the enhancement New feature or request label Jul 28, 2023

jllllll mentioned this issue Jul 31, 2023

ctransformers: another attempt #3313

Merged

tallesairan added a commit to tallesairan/text-generation-webui that referenced this issue Aug 1, 2023

Update docker-compose.yml

4d04af8

oobabooga#3357

github-actions bot added the stale label Oct 6, 2023

github-actions bot closed this as completed Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ggllm.cpp #3357

Add support for ggllm.cpp #3357

djmaze commented Jul 28, 2023

Flanua commented Jul 29, 2023

jllllll commented Jul 30, 2023

djmaze commented Jul 30, 2023 •

edited

Loading

jllllll commented Jul 30, 2023

goodglitch commented Jul 30, 2023

oobabooga commented Jul 30, 2023

jllllll commented Jul 30, 2023 •

edited

Loading

oobabooga commented Jul 30, 2023

jllllll commented Jul 30, 2023

RDearnaley commented Aug 24, 2023 •

edited

Loading

jllllll commented Aug 25, 2023 •

edited

Loading

github-actions bot commented Oct 6, 2023

augchan42 commented Dec 5, 2023

augchan42 commented Dec 5, 2023 •

edited

Loading

Add support for ggllm.cpp #3357

Add support for ggllm.cpp #3357

Comments

djmaze commented Jul 28, 2023

Flanua commented Jul 29, 2023

jllllll commented Jul 30, 2023

djmaze commented Jul 30, 2023 • edited Loading

jllllll commented Jul 30, 2023

goodglitch commented Jul 30, 2023

oobabooga commented Jul 30, 2023

jllllll commented Jul 30, 2023 • edited Loading

oobabooga commented Jul 30, 2023

jllllll commented Jul 30, 2023

RDearnaley commented Aug 24, 2023 • edited Loading

jllllll commented Aug 25, 2023 • edited Loading

github-actions bot commented Oct 6, 2023

augchan42 commented Dec 5, 2023

augchan42 commented Dec 5, 2023 • edited Loading

djmaze commented Jul 30, 2023 •

edited

Loading

jllllll commented Jul 30, 2023 •

edited

Loading

RDearnaley commented Aug 24, 2023 •

edited

Loading

jllllll commented Aug 25, 2023 •

edited

Loading

augchan42 commented Dec 5, 2023 •

edited

Loading