-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert RandomZoom to backend-agnostic and improve affine_transform
#574
Conversation
Is it because of the behavior of |
I believe so. I can dig deeper into it to identify the source of the differences.
The correctness tests failed both in # bilinear (JAX, TF, NumPy)
# zoom -0.5
[[ 6. 6.5 7. 7.5 8. ]
[ 8.5 9. 9.5 10. 10.5]
[11. 11.5 12. 12.5 13. ]
[13.5 14. 14.5 15. 15.5]
[16. 16.5 17. 17.5 18. ]]
# zoom 0.5, 0.8
[[ 0.5999999 0.20000005 2. 3.7999997 3.4 ]
[ 3.1 2.7 4.5 6.2999997 5.9 ]
[10.6 10.2 12. 13.799999 13.4 ]
[18.1 17.7 19.5 21.3 20.9 ]
[20.6 20.2 22. 23.8 23.4 ]]
# bilinear (Torch)
# zoom -0.5
tensor([[ 4.5000, 5.0000, 5.5000, 6.0000, 6.5000],
[ 7.0000, 7.5000, 8.0000, 8.5000, 9.0000],
[ 9.5000, 10.0000, 10.5000, 11.0000, 11.5000],
[12.0000, 12.5000, 13.0000, 13.5000, 14.0000],
[14.5000, 15.0000, 15.5000, 16.0000, 16.5000]])
# zoom 0.5, 0.8
tensor([[ 0.2000, 0.6000, 2.4000, 4.0000, 3.0000],
[ 3.9500, 4.3500, 6.1500, 7.7500, 6.7500],
[11.4500, 11.8500, 13.6500, 15.2500, 14.2500],
[18.9500, 19.3500, 21.1500, 22.7500, 21.7500],
[18.9500, 19.3500, 21.1500, 22.7500, 21.7500]]) |
So the root cause is that interpolation is implemented differently in torch (specifically only torch)? This is a bit surprising because there is an exact pixel-level algorithm that corresponds to each interpolation mode. Is the difference visually noticeable on real images? The magnitude of the differences in test arrays looks quite significant. If we conclude that the difference is small, we can roll with it and make a note in the docs that torch has numerical differences. If the difference is significant, we should explore alternative solutions. |
Actually, when I implementing keras-core/keras_core/ops/image_test.py Lines 171 to 176 in 628db59
There is the comparison of the image with the size (600, 512). I think the difference is not visually noticeable. Using the code below with KERAS_BACKEND=tensorflow/torch respectively: import matplotlib.cbook as cbook
import matplotlib.pyplot as plt
from keras_core.layers import RandomZoom
with cbook.get_sample_data("grace_hopper.jpg") as image_file:
image = plt.imread(image_file)
layer = RandomZoom(
height_factor=(-0.5, -0.5),
width_factor=(-0.5, -0.5),
fill_mode="constant",
interpolation="nearest",
)
output1 = layer(image)
layer = RandomZoom(
height_factor=(0.5, 0.5),
width_factor=(0.8, 0.8),
fill_mode="constant",
interpolation="nearest",
)
output2 = layer(image)
fig, ax_dict = plt.subplot_mosaic([["A", "B", "C"]], figsize=(12, 4))
ax_dict["A"].set_title("Original")
ax_dict["A"].imshow(image)
ax_dict["B"].set_title("Zoom In")
ax_dict["B"].imshow(output1 / 255.0)
ax_dict["C"].set_title("Zoom Out")
ax_dict["C"].imshow(output2 / 255.0)
fig.tight_layout(h_pad=0.1, w_pad=0.1)
plt.savefig("affine.png")
It seems that:
|
Thanks for the detailed info! I think we should go with option 1:
We should make sure to include the note in the |
I'm currently working on a more consistent interpolation method in |
Sounds good! |
After refactoring the
This table can be verified by the updated Some notes:
Now, I have a question: I think there are two options:
What should I do? |
interpolation=interpolation.upper(), | ||
fill_mode=fill_mode.upper(), | ||
coordinates = _compute_affine_transform_coordinates(x, transform) | ||
ref_out = scipy.ndimage.map_coordinates( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I replaced tf.raw_ops.ImageProjectiveTransformV3
with scipy.ndimage.map_coordinates
because scipy has more fill_mode options to test all backends.
Thanks for the analysis! I think we can make We cannot pick nearest or constant as the default in RandomZoom/etc., because that would diminish the benefits of data augmentation. It's important that "new" areas in the augmented images be filled with content that is in-distribution and that minimizes visual discontinuity. Otherwise you train on information-free pixels (at best) or OOD pixels (at worst). |
affine_transform
I have updated the PR with the solution you mentioned. All tests passed.
Thanks for the clarification! Now I know the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work -- thank you for the great contribution!
Related to keras-team/keras#18442
EDITED:
please see #574 (comment)
=====OLD=====
Expected output in correctness tests with torch backend has changed. The root cause might come from the inconsistency of the interpolation.