You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
(1) I found that the ST layer always scales down input such that it looks like a transformed image on a black canvas.
(2) When the batch size is large (>300) the transformed output contains negative values in alternate iterations. I found this extremely bizarre.
Any help with these issues will help! I'm running on a deadline so a quick response would also be much appreciated.
The text was updated successfully, but these errors were encountered:
For (1), you could add additional loss to make the scaling factor in the transformation matrix big. It's possible that ST layer will generate black padding around the images. My experiment shows that this will not affect the final classification on MNIST and CUB dataset.
For (2), sorry that I have no idea. It is irregular to have batch size larger than 64 to processing images I think. Never encounter this problem. I think there should be no negative values if your inputs are all non-negative since ST layer is just doing interpolation.
Following up on (2), I find that almost all the affine parameters predicted by the localisation network results in very bizarre transformations (like negative x,y coordinates).
How do you make sure that the affine coordinates are proper? I'm not sure that there is a simple loss function that can ensure this.
Also, in the original paper by Zisserman et al, they don't mention using any loss on thetas, so do you have any idea what the difference in their implementation might be?
Thank you for your interest in my code.
I'm actually not sure about how the original authors produce their results. But I guess the following rules should help:
Make the learning rate for the localization network smaller than the regular. Make it 1e-2 ~ 1e-3 times smaller will help the prediction to change slowly.
You may add some penalty loss for the magnitude of the transformation to make the transformation small and smooth.
Hi,
(1) I found that the ST layer always scales down input such that it looks like a transformed image on a black canvas.
(2) When the batch size is large (>300) the transformed output contains negative values in alternate iterations. I found this extremely bizarre.
Any help with these issues will help! I'm running on a deadline so a quick response would also be much appreciated.
The text was updated successfully, but these errors were encountered: