-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export_int8_model.py size issue #91
Comments
That's because the shape of bias initialized is (1,self.out_features), you can modify it to (self.out_features,) to solve the problem. In my case, the opt-1.3b-int8 accuracy is 0.698, per-sample latency is 35.392 on 3090.
|
Hi, I'm having trouble with the export_int8_model.py code results and would like to ask a question.
The model in the huggingface was fine, but I'm wondering what is causing the size issue when applying the int8 model saved from export_int8_model.py to Int8OPTForCausalLM.from_pretrained() in the examples/smoothquant_opt_real_int8_demo.ipynb code.
So I modified the code for the torch_int class w8a8b8o8linear(torch.nn.Module) to the following, and this is the result SmoothQuant INT8 accuracy: 0.407, per-sample latency: 38.878ms
The text was updated successfully, but these errors were encountered: