Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did you improve the performance using per-channel in weight quantization? #10

Open
talenz opened this issue Apr 14, 2021 · 4 comments
Open

Comments

@talenz
Copy link

talenz commented Apr 14, 2021

Hi,
Great implementation! Since per-channel weight quantization is implemented in you code, I'm wondering if there is any improvement compared to per-tensor weight quantization.

@zhutmost
Copy link
Owner

zhutmost commented Apr 14, 2021

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky.
I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).

(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

@talenz
Copy link
Author

talenz commented Apr 14, 2021

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky.
I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).

(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?

@zhutmost
Copy link
Owner

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky.
I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).
(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?

You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices.

@talenz
Copy link
Author

talenz commented Apr 21, 2021

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky.
I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).
(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?

You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices.

Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants