You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your open-source contribution
I am using Omniquant weight and activation to quantinize the Llama 3.2-8B but I found an error when deploy_fake_quant_model. Here is the error log
2024-12-11 00:13:01.963 | INFO | llmc.models.base_model:replace_language_module_all:374 - Replace block index: 0/32
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 316, in
[rank0]: main(config)
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 154, in main
[rank0]: blockwise_opt.deploy('fake_quant')
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 702, in deploy
[rank0]: super().deploy(quant_format)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 845, in deploy
[rank0]: self.model.replace_language_module_all(
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 380, in replace_language_module_all
[rank0]: self.replace_module_block(module, block, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 396, in replace_module_block
[rank0]: self.replace_module_subset(module, block, subset, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 417, in replace_module_subset
[rank0]: M = module.new(m, **params_tmp_dict)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/module_utils.py", line 912, in new
[rank0]: weight = w_qdq(module)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 693, in w_qdq
[rank0]: if module.dynamic_quant_weight:
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'OriginFloatLinear' object has no attribute 'dynamic_quant_weight'
The text was updated successfully, but these errors were encountered:
Yes, here is the configuration file, the issue at line 154 in the main file, I skipped the evaluation part and sucessfully save the model into vllm but face another issue which I reported in a seperated git issue.
base:
seed: &seed 42
model:
type: Llama
path: meta-llama/Meta-Llama-3-8B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: wikitext2
download: True
path: /save_files/data
n_samples: 128
bs: 1
seq_len: 2048
preproc: wikitext2_gptq
seed: *seed
eval:
eval_pos: [fake_quant]
name: wikitext2
download: True
path: /save_files/data
seq_len: 2048
# For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
# For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
bs: 1
inference_per_block: False
quant:
method: OmniQuant
weight:
bit: 8
symmetric: False
granularity: per_channel
calib_algo: learnable
ste: True
act:
bit: 8
symmetric: False
granularity: per_token
ste: True
special:
aug_loss: False
lwc: True
let: True
lwc_lr: 0.001
let_lr: 0.001
use_shift: False
alpha: 0.5
deactive_amp: True
epochs: 5
wd: 0
# Use AWQ's search clip factors to initialize OmniQuant's clip factors,
# Then refine them through learning (LWC).
search_clip_init: False
load_clip: False
clip_path: save_files
# Use AWQ's search scale factors to initialize OmniQuant's scale factors,
# Then refine them through learning (LET).
search_scale_init: False
scale_path: save_files
robust_weight: 0
quant_out: True
save:
save_trans: False
save_fake: False
save_vllm: True
save_path: ./save
Hello LLMC Team,
Thank you for your open-source contribution
I am using Omniquant weight and activation to quantinize the Llama 3.2-8B but I found an error when deploy_fake_quant_model. Here is the error log
2024-12-11 00:13:01.963 | INFO | llmc.models.base_model:replace_language_module_all:374 - Replace block index: 0/32
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 316, in
[rank0]: main(config)
[rank0]: File "/data/dwenlong/llmc/llmc/main.py", line 154, in main
[rank0]: blockwise_opt.deploy('fake_quant')
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 702, in deploy
[rank0]: super().deploy(quant_format)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 845, in deploy
[rank0]: self.model.replace_language_module_all(
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 380, in replace_language_module_all
[rank0]: self.replace_module_block(module, block, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 396, in replace_module_block
[rank0]: self.replace_module_subset(module, block, subset, block_idx, params_dict)
[rank0]: File "/data/dwenlong/llmc/llmc/models/base_model.py", line 417, in replace_module_subset
[rank0]: M = module.new(m, **params_tmp_dict)
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/module_utils.py", line 912, in new
[rank0]: weight = w_qdq(module)
[rank0]: File "/data/dwenlong/llmc/llmc/compression/quantization/omniq.py", line 693, in w_qdq
[rank0]: if module.dynamic_quant_weight:
[rank0]: File "/data/dwenlong/omniquant/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'OriginFloatLinear' object has no attribute 'dynamic_quant_weight'
The text was updated successfully, but these errors were encountered: