Added test llama-2-7b with GPTQ quant. scheme #141

lopozz · 2024-02-29T14:55:34Z

test TheBloke/Llama-2-7B-GPTQ with pytorch backend and cuda hardware. #95

IlyasMoutawwakil · 2024-03-01T02:41:02Z

Thanks a lot for the PR ! I have few suggestions:

let's use TheBloke/TinyLlama-1.1B-Chat-v0.3-GPTQ to reduce download time like in trt-llm (this is just the smallest llama I can think of)
let's use the compose api, ie inherit stuff from other configs instead of rewriting them like here.

lopozz · 2024-03-01T14:43:33Z

Hi thanks for the feedback. Should I create a default file for llama like it was done for gpt or timm? Also I am currently working with a machine with one GPU. hence, the backend.device_ids: 0

IlyasMoutawwakil · 2024-03-04T07:06:46Z

no need, just reuse the ones already provided as much as possible and explicit the rest:

defaults:
  - backend: pytorch # default backend
  # order of inheritance, last one overrides previous ones
  - _base_ # inherits from base config
  - _inference_ # inherits from inference config
  - _cuda_ # inherits from cuda config
  - _self_ # hydra 1.1 compatibility

experiment_name: cuda_inference_pytorch_gptq

backend:
  model: TheBloke/TinyLlama-1.1B-Chat-v0.3-GPTQ
  quantization_config:
    exllama_config:
      version: 2

hydra:
  sweeper:
    params:
      backend.no_weights: true,false

lopozz · 2024-03-06T11:04:02Z

Hi @IlyasMoutawwakil I just modified the file with your feedback. Let me know if I can do something else

IlyasMoutawwakil · 2024-03-08T17:43:34Z

@lopozz thanks, can you also add the gptq pip installation to the test workflow, you will have to use the index urls from https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation

lopozz · 2024-03-13T18:12:55Z

Thanks for the support @IlyasMoutawwakil, it is my first pull request, but I think I solved the issue. Also, seeing the comment #144 (comment) I did not modify the setup.py

Main changes: pip installation of optimum and auto-gptq packages

Let me know if you have further feedback.

IlyasMoutawwakil · 2024-03-15T09:11:25Z

@lopozz thanks a lot for working on this.
optimum and auto-gptq are still missing from setup.py's extras.
I would suggest not adding optimum as a standalone but rather grouping both deps in auto-gptq

    "bitsandbytes": ["bitsandbytes"],
    "auto-gptq": ["optimum", "auto-gptq"],

This will probably not work with ROCm 5.6 & 5.7 and CUDA 11.8 😅 as for these systems an extra url is required to download the right wheels, as explained in: https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation
It can be solved using dependency_links as explained in https://stackoverflow.com/a/30064248, but tell me if this is too complicated for the scope of your PR.

lopozz · 2024-03-19T16:39:16Z

@IlyasMoutawwakil
update of the last commit:

added 3 new identifiers in setup.py for cuda121, cuda118 and rocm to install optimum and auto-gptq according to the corresponding index urls.

I did not used dependency_links as you suggested because deprecated starting with pip version 19.0 (released 2019-01-22) https://setuptools.pypa.io/en/latest/deprecated/dependency_links.html#specifying-dependencies-that-aren-t-in-pypi-via-dependency-links. Let me know if you agree, in case I can modify it.

setup.py

Makefile

.github/workflows/test_cli_rocm_pytorch.yaml

IlyasMoutawwakil · 2024-03-20T09:33:05Z

Thanks, I left some comments, don't forget to run styling.
There appear to be a SIGSEGV when testing GPTQ, probably something wrong with the cleanup code, I'll investigate the cause of that.

IlyasMoutawwakil · 2024-03-20T10:10:26Z

okay found why, in the code I use exllama_version but it's supposed to be just version. So g_idx is not created in no weights model, resulting in a SIGSEGV. I will fix this in another PR and merge it quickly.
fixed in #165

IlyasMoutawwakil · 2024-03-21T08:21:35Z

Thanks for the addition @lopozz great work on your first PR 🤗

lopozz added 2 commits February 27, 2024 23:24

Added test llama-2-7B-GPTQ

48cb08c

Small fixes test llama-2-7B-GPTQ

bca1da4

lopozz closed this Feb 29, 2024

lopozz reopened this Feb 29, 2024

Inheritance form standard configuration

f67d71b

lopozz and others added 4 commits March 13, 2024 14:51

Merge branch 'huggingface:main' into main

c642905

Added auto-gptq installation

0aa1286

Added optimum installation

37d551f

Small fix test cli cuda

c05e6ee

Added optimum and auto-gptq to setup.py with cuda and rocm version

73a4cd1

IlyasMoutawwakil requested changes Mar 20, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

.github/workflows/test_cli_rocm_pytorch.yaml Outdated Show resolved Hide resolved

added not gptq for rocm tests and removed auto-gptq-rocm from setup.py

1de4569

IlyasMoutawwakil self-requested a review March 21, 2024 08:22

IlyasMoutawwakil approved these changes Mar 21, 2024

View reviewed changes

IlyasMoutawwakil merged commit 38b89e7 into huggingface:main Mar 21, 2024
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added test llama-2-7b with GPTQ quant. scheme #141

Added test llama-2-7b with GPTQ quant. scheme #141

lopozz commented Feb 29, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 1, 2024

lopozz commented Mar 1, 2024

IlyasMoutawwakil commented Mar 4, 2024

lopozz commented Mar 6, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 8, 2024 •

edited

Loading

lopozz commented Mar 13, 2024

IlyasMoutawwakil commented Mar 15, 2024

lopozz commented Mar 19, 2024

IlyasMoutawwakil commented Mar 20, 2024

IlyasMoutawwakil commented Mar 20, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 21, 2024 •

edited

Loading

Added test llama-2-7b with GPTQ quant. scheme #141

Added test llama-2-7b with GPTQ quant. scheme #141

Conversation

lopozz commented Feb 29, 2024 • edited Loading

IlyasMoutawwakil commented Mar 1, 2024

lopozz commented Mar 1, 2024

IlyasMoutawwakil commented Mar 4, 2024

lopozz commented Mar 6, 2024 • edited Loading

IlyasMoutawwakil commented Mar 8, 2024 • edited Loading

lopozz commented Mar 13, 2024

IlyasMoutawwakil commented Mar 15, 2024

lopozz commented Mar 19, 2024

IlyasMoutawwakil commented Mar 20, 2024

IlyasMoutawwakil commented Mar 20, 2024 • edited Loading

IlyasMoutawwakil commented Mar 21, 2024 • edited Loading

lopozz commented Feb 29, 2024 •

edited

Loading

lopozz commented Mar 6, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 8, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 20, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 21, 2024 •

edited

Loading