-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Did you improve the performance using per-channel in weight quantization? #10
Comments
I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. (And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D |
I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice? |
You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices. |
Thanks~ |
Hi,
Great implementation! Since per-channel weight quantization is implemented in you code, I'm wondering if there is any improvement compared to per-tensor weight quantization.
The text was updated successfully, but these errors were encountered: