Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omniquant error with Llama 3.2-8B #255

Open
vengdeng opened this issue Dec 11, 2024 · 2 comments
Open

Omniquant error with Llama 3.2-8B #255

vengdeng opened this issue Dec 11, 2024 · 2 comments

Comments

@vengdeng
Copy link

Hello LLMC Team,

Thank you for your open-source contribution
I am using Omniquant weight and activation to quantinize the Llama 3.2-8B but I found an error when deploy_fake_quant_model. Here is the error log

2024-12-11 00:13:01.963 | INFO | llmc.models.base_model:replace_language_module_all:374 - Replace block index: 0/32
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 316, in
[rank0]: main(config)
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 154, in main
[rank0]: blockwise_opt.deploy('fake_quant')
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 702, in deploy
[rank0]: super().deploy(quant_format)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 845, in deploy
[rank0]: self.model.replace_language_module_all(
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 380, in replace_language_module_all
[rank0]: self.replace_module_block(module, block, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 396, in replace_module_block
[rank0]: self.replace_module_subset(module, block, subset, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 417, in replace_module_subset
[rank0]: M = module.new(m, **params_tmp_dict)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/module_utils.py", line 912, in new
[rank0]: weight = w_qdq(module)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 693, in w_qdq
[rank0]: if module.dynamic_quant_weight:
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'OriginFloatLinear' object has no attribute 'dynamic_quant_weight'

@gushiqiao
Copy link
Contributor

Can you provide the configuration file?

@vengdeng
Copy link
Author

Yes, here is the configuration file, the issue at line 154 in the main file, I skipped the evaluation part and sucessfully save the model into vllm but face another issue which I reported in a seperated git issue.

base:
seed: &seed 42
model:
type: Llama
path: meta-llama/Meta-Llama-3-8B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: wikitext2
download: True
path: /save_files/data
n_samples: 128
bs: 1
seq_len: 2048
preproc: wikitext2_gptq
seed: *seed
eval:
eval_pos: [fake_quant]
name: wikitext2
download: True
path: /save_files/data
seq_len: 2048
# For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
# For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
bs: 1
inference_per_block: False
quant:
method: OmniQuant
weight:
bit: 8
symmetric: False
granularity: per_channel
calib_algo: learnable
ste: True
act:
bit: 8
symmetric: False
granularity: per_token
ste: True
special:
aug_loss: False
lwc: True
let: True
lwc_lr: 0.001
let_lr: 0.001
use_shift: False
alpha: 0.5
deactive_amp: True
epochs: 5
wd: 0
# Use AWQ's search clip factors to initialize OmniQuant's clip factors,
# Then refine them through learning (LWC).
search_clip_init: False
load_clip: False
clip_path: save_files
# Use AWQ's search scale factors to initialize OmniQuant's scale factors,
# Then refine them through learning (LET).
search_scale_init: False
scale_path: save_files
robust_weight: 0
quant_out: True
save:
save_trans: False
save_fake: False
save_vllm: True
save_path: ./save

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants