Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient w.r.t the negative scaling factor might be wrong? #6

Open
magicwyzh opened this issue Mar 10, 2018 · 1 comment
Open

Gradient w.r.t the negative scaling factor might be wrong? #6

magicwyzh opened this issue Mar 10, 2018 · 1 comment

Comments

@magicwyzh
Copy link

In trained-ternary-quantization/utils/quantization.py, line42
The last returned value of function "get_grads" is the gradient w.r.t the negative scaling factor. I think the code might be wrong...... (not sure)

Consider a simple condition that the kernel is a 1x1 tensor,
According to my understanding, during the forward pass, we have:
t = ternarize(fp_kernel),
where the function "ternarize" makes the fp_kernel become +1, -1, 0, fp_kernel is the full-precision kernel. For negative part of fp_kernel, t=-1, for positive part, t = +1.

Then, the final negative part of scaled ternary kernel is:
y = w_n * t

In my opinion, gradient w.r.t w_n should equals "grad_y * t", where "grad_y" is the gradient w.r.t negative scaled ternary kernel and corresponds to the "b*kernel_grad" in your code (line42).

Because of "t=-1" for the negative part of the kernel, i think the gradient w.r.t w_n should be
grad_y * t = grad_y * (-1) = -b*kernel_grad

This result indicates that the last return value of function "get_grads" in your code should be
"(-b*kernel_grad).sum()"

Am I right?

@TropComplique
Copy link
Owner

Yeah, sounds right.
But in the original paper they calculate gradient like I do in my implementation (see page 4, equation 7). Maybe there is an error in the paper. Try writing to the authors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants