Omniquant error with Llama 3.2-8B #255

vengdeng · 2024-12-11T17:57:30Z

Hello LLMC Team,

Thank you for your open-source contribution
I am using Omniquant weight and activation to quantinize the Llama 3.2-8B but I found an error when deploy_fake_quant_model. Here is the error log

2024-12-11 00:13:01.963 | INFO | llmc.models.base_model:replace_language_module_all:374 - Replace block index: 0/32
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 316, in
[rank0]: main(config)
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 154, in main
[rank0]: blockwise_opt.deploy('fake_quant')
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 702, in deploy
[rank0]: super().deploy(quant_format)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 845, in deploy
[rank0]: self.model.replace_language_module_all(
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 380, in replace_language_module_all
[rank0]: self.replace_module_block(module, block, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 396, in replace_module_block
[rank0]: self.replace_module_subset(module, block, subset, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 417, in replace_module_subset
[rank0]: M = module.new(m, **params_tmp_dict)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/module_utils.py", line 912, in new
[rank0]: weight = w_qdq(module)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 693, in w_qdq
[rank0]: if module.dynamic_quant_weight:
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'OriginFloatLinear' object has no attribute 'dynamic_quant_weight'

gushiqiao · 2024-12-12T04:40:48Z

Can you provide the configuration file?

vengdeng · 2024-12-12T04:48:03Z

Yes, here is the configuration file, the issue at line 154 in the main file, I skipped the evaluation part and sucessfully save the model into vllm but face another issue which I reported in a seperated git issue.

base:
seed: &seed 42
model:
type: Llama
path: meta-llama/Meta-Llama-3-8B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: wikitext2
download: True
path: /save_files/data
n_samples: 128
bs: 1
seq_len: 2048
preproc: wikitext2_gptq
seed: *seed
eval:
eval_pos: [fake_quant]
name: wikitext2
download: True
path: /save_files/data
seq_len: 2048
# For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
# For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
bs: 1
inference_per_block: False
quant:
method: OmniQuant
weight:
bit: 8
symmetric: False
granularity: per_channel
calib_algo: learnable
ste: True
act:
bit: 8
symmetric: False
granularity: per_token
ste: True
special:
aug_loss: False
lwc: True
let: True
lwc_lr: 0.001
let_lr: 0.001
use_shift: False
alpha: 0.5
deactive_amp: True
epochs: 5
wd: 0
# Use AWQ's search clip factors to initialize OmniQuant's clip factors,
# Then refine them through learning (LWC).
search_clip_init: False
load_clip: False
clip_path: save_files
# Use AWQ's search scale factors to initialize OmniQuant's scale factors,
# Then refine them through learning (LET).
search_scale_init: False
scale_path: save_files
robust_weight: 0
quant_out: True
save:
save_trans: False
save_fake: False
save_vllm: True
save_path: ./save

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omniquant error with Llama 3.2-8B #255

Omniquant error with Llama 3.2-8B #255

vengdeng commented Dec 11, 2024

gushiqiao commented Dec 12, 2024

vengdeng commented Dec 12, 2024

Omniquant error with Llama 3.2-8B #255

Omniquant error with Llama 3.2-8B #255

Comments

vengdeng commented Dec 11, 2024

gushiqiao commented Dec 12, 2024

vengdeng commented Dec 12, 2024