-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test llama-2-7b with GPTQ quant. scheme #141
Conversation
Hi thanks for the feedback. Should I create a default file for llama like it was done for gpt or timm? Also I am currently working with a machine with one GPU. hence, the backend.device_ids: 0 |
no need, just reuse the ones already provided as much as possible and explicit the rest: defaults:
- backend: pytorch # default backend
# order of inheritance, last one overrides previous ones
- _base_ # inherits from base config
- _inference_ # inherits from inference config
- _cuda_ # inherits from cuda config
- _self_ # hydra 1.1 compatibility
experiment_name: cuda_inference_pytorch_gptq
backend:
model: TheBloke/TinyLlama-1.1B-Chat-v0.3-GPTQ
quantization_config:
exllama_config:
version: 2
hydra:
sweeper:
params:
backend.no_weights: true,false |
Hi @IlyasMoutawwakil I just modified the file with your feedback. Let me know if I can do something else |
@lopozz thanks, can you also add the gptq pip installation to the test workflow, you will have to use the index urls from https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation |
Thanks for the support @IlyasMoutawwakil, it is my first pull request, but I think I solved the issue. Also, seeing the comment #144 (comment) I did not modify the setup.py Main changes: pip installation of optimum and auto-gptq packages Let me know if you have further feedback. |
@lopozz thanks a lot for working on this. "bitsandbytes": ["bitsandbytes"],
"auto-gptq": ["optimum", "auto-gptq"], This will probably not work with ROCm 5.6 & 5.7 and CUDA 11.8 😅 as for these systems an extra url is required to download the right wheels, as explained in: https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation |
@IlyasMoutawwakil
I did not used dependency_links as you suggested because deprecated starting with pip version 19.0 (released 2019-01-22) https://setuptools.pypa.io/en/latest/deprecated/dependency_links.html#specifying-dependencies-that-aren-t-in-pypi-via-dependency-links. Let me know if you agree, in case I can modify it. |
Thanks, I left some comments, don't forget to run styling. |
okay found why, in the code I use |
Thanks for the addition @lopozz great work on your first PR 🤗 |
test TheBloke/Llama-2-7B-GPTQ with pytorch backend and cuda hardware. #95