-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in ProjectedGradientDescent implementation? #1115
Comments
Which implementation are you referring to? |
MadryEtal uses ProjectedGradientDescent, which further uses FGM to compute the perturbation using the gradient. I could not find any step in the FGSM implementation that normalizes gradients (I may have missed this step in case it is there, in which case I apologize) |
@iamgroot42 I might be misunderstanding this but if by "normalize" you mean the projection step, I think it is in |
@michaelshiyu if you have a look here (the step() function is called inside the main attack), they normalize the gradient first. The 'clip_eta' step is for limiting the perturbation to the epsilon L-p ball, which is different from what I am talking about |
@iamgroot42 I don't think this normalization is mentioned in either of the papers referenced by CleverHans' PGD implementation. Although, I think both papers only discussed the inf-norm case unless I've missed something. Also, in their implementation, wouldn't this normalization step affect the following projection step at least when using 2-norm? For example, say if a certain gradient vector has 2-norm 0.5 and we set |
Page 18 of this paper by the same research lab references the Madry attack, saying that "we normalize the gradient at each step of PGD". If you look at their implementation in the robustness package (which is from the research lab which published the PGD paper), they do actually normalize the gradients (the link I gave in my previous comment). Since this library is maintained and used by the PGD authors themselves, I think we ought to have a similar implementation? What do you think @michaelshiyu @npapernot |
@iamgroot42 I was referring to the normalization in the robustness package. I might be wrong but I think it interferes with the projection step at least when we are using 2-norm. Is this a wanted effect? Also, I might be wrong again but I don't think the PGD attack was proposed by the Madry lab. They themselves referred to another paper in their ICLR2018 paper. |
I think they are aware of that fact, and that it may not be interference according to them. They normalize the gradients first and then do other projection steps. As far as the attack is concerned, well, the Cleverhans implementation does read 'MadryEtAl' :P This is why I am a bit confused: which one of these two is the correct implementation? |
@iamgroot42 Personally, I find it difficult to argue that such normalization is beneficial in any generic setting since it messes up the projection step. I took another look at the "Adversarial Examples Are Not Bugs" paper and I couldn't find a place where they discuss how this normalization helps. My take is that if you normalize gradient before projecting, you are not projecting the true gradient so you are not performing projected gradient descent as it is from optimization. Anyway, I guess they found this to be helpful in some way. But I think this is more of a heuristic thing than being correct or not. |
So in that case, the Cleverhans implementation is correct, and theirs is not (in the sense that it is not true to the original attack, and they should keep the normalization optional and use it only when needed, for example when implementing the mentioned paper)? |
@iamgroot42 I think at least in the 2-norm case, which is what they did for the robustness package, normalizing or not does not really change things since you can undo the effect of normalization by changing the step size for each example. But yes, I think the CleverHans implementation follows the strict definition of projected gradient descent as an optimization technique. With that said, I think @npapernot is much more qualified than me to say which one is correct. |
The original work by Madry et al mentions that in the optimization loop of their attack, they normalize the gradient before multiplying it with the step size. I may have missed some function calls made, but after having a look at the ProjectedGradientDescent implementation, it seems that at no stage are the gradients normalized before perturbing the image.
A recently released version code-base by the authors themselves for this attack uses gradient normalization as well: https://github.com/MadryLab/robustness/blob/master/robustness/attack_steps.py#L127
I am not sure if this missing step in the cleverhans implementation is a bug or is done effectively via some other function call/steps (maybe earlier/later somewhere in the code)?
The text was updated successfully, but these errors were encountered: