-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add random_transform_generator for augmenting images. #190
Conversation
b0b0ccf
to
7c5622e
Compare
# Transform the bounding boxes in the annotations. | ||
annotations = annotations.copy() | ||
for index in range(annotations.shape[0]): | ||
annotations[index, :4] = transform_aabb(transform, annotations[index, :4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I was wondering: What should the generator do if transform_aabb
puts the annotation outside of the image canvas? I'm assuming they will automatically be filtered now by compute_input_output
. But maybe I overlooked something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is that desirable? When the aabb ends up outside the image and it is filtered, that means that part of the image becomes background, even though it was actually an annotation.
Should we instead discard the entire image? Or should we try up to X times to get a transformation that doesn't put any annotations outside of the image (and discard the whole image if that failed X times)? Or should we clip the new bounding box with the image canvas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought is to clip the annotation to the image canvas, to preserve as much of the annotations as possible. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I have a slight feeling that it might reduce the quality of training in some cases. Maybe some very distinctive part of the thing to recognize is clipped off, or maybe the clipped thing looks more like a different object (contrived example: a face with a clown nose vs a face with a normal nose).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think filtering it (the annotation) out entirely is more harmful than clipping. So either we make sure it doesn't occur, or we clip it in my opinion.
7c5622e
to
99c1b7b
Compare
Note: due to the holiday period, reviews are going to be slightly delayed. That also goes for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Have a few questions in the comments, but this will be very useful.
ps. do you have some time comparison of this method w.r.t. the method in Keras?
image = self.preprocess_image(image) | ||
|
||
# randomly transform both image and annotations | ||
if self.transform_generator is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use if self.transform_generator:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason really. Do you prefer it like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
# Transform the bounding boxes in the annotations. | ||
annotations = annotations.copy() | ||
for index in range(annotations.shape[0]): | ||
annotations[index, :4] = transform_aabb(transform, annotations[index, :4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought is to clip the annotation to the image canvas, to preserve as much of the annotations as possible. What do you think ?
keras_retinanet/utils/transform.py
Outdated
return [min_corner[0], min_corner[1], max_corner[0], max_corner[1]] | ||
|
||
|
||
def _random_vector(min, max, prng = DEFAULT_PRNG): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-aligned default arguments are written down without spaces around the equal sign (same for some other places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8 could be used for catching those spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with using pytest --pep8
for that is that we do use spaces around the parameter assignment on multi-line function calls for alignment. The pep8 test doesn't distinguish between that and this type of whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, fixed, I think.
keras_retinanet/utils/transform.py
Outdated
]) | ||
|
||
|
||
def random_translation(min, max, prng = DEFAULT_PRNG): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to distinguish between X/Y translation min/max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this module I would say yes. These are generic transform functions. My initial reasoning was that you might want to scale the max and min with the image width and height. That isn't necessary now since the data generator does the scaling itself after generating the transform.
Still, I think it's a valid use case for a generic transform function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn you github, this comment thread is not outdated -.-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this code handles X and Y the same way right? Using the same min/max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, min and max are 2D vectors (or a list/tuple/sequence of two elements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I missed that part then. 👍
def shear(amount): | ||
""" Construct a homogeneous 2D shear matrix. | ||
# Arguments | ||
amount: the shear amount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it a bit more clear what it is. Presumably it is the angle of the shear in radians?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an angle, though the matrix generated isn't actually a shear matrix. However, I copied it from keras
since I figured we should probably use the same definition of shear as keras
does, even if it is technically incorrect.
I believe the keras
definition of shear compensates for the stretching that a real shear operation would do by reducing the Y component as the X component gets larger. So it's sort of halfway between a shear and a rotation.
Another thing I wasn't really happy about is that this shear function only shears parallel to the X axis. It's not very generic.
keras_retinanet/utils/transform.py
Outdated
def random_shear(min, max, prng = DEFAULT_PRNG): | ||
""" Construct a random 2D shear matrix with shear angle between -max and max. | ||
# Arguments | ||
min: the minumum shear factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
keras_retinanet/utils/transform.py
Outdated
]) | ||
|
||
|
||
def random_scaling(min, max, prng = DEFAULT_PRNG): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, do we want to distinguish between x/y ?
As far as I can tell we only need to decide on what to do when an annotation is outside of the image canvas after transformation. When does this happen? Shouldn't the annotation always remain in the image canvas (aside from some rounding errors perhaps)? |
No, rotations can easily put annotations completely or partly out of the canvas. The same is true for shearing and translation. Only flips don't have this problem really. |
Two things:
|
Shall we move on to merging this, and worry about that edge case in a later PR? |
If by the edge case you mean when bounding boxes end up outside the image, I think it should be fixed first. I have been getting that result with the current augmentations quite a lot, and it is a problem. I suppose clipping when possible, and not crashing otherwise would be my preferred solution. |
Crashing is not supposed to happen. I propose to indeed either clip for now, or leave it untouched. Personally I still believe clipping would be fine. |
Yeah I think crashing was actually fixed by a change a while back. I am in favour of merging this soon though! I have been oogling this branch for a while now, would be awesome to be able to use this! |
I also don't think it will crash now. I believe it will generate invalid annotations which will be filtered out automatically. For my part, we can make an issue to resolve the discarding/clipping problem and merge as is. |
122cc31
to
ca6f5b6
Compare
I tagged a 0.1 pre-release, since we seemed to be pretty stable. Rebased this branch on master. Once CI passes I'll merge. |
CI passed, merging. |
Discussion on what to do with the annotations partially transformed out of the canvas can continue at #223 |
Add random_transform_generator for augmenting images.
This PR adds a
random_transform_generator
to be used for augmenting data. It replaces the use ofkeras.preprocessing.image.ImageDataGenerator
.The main reason for doing this is that
ImageDataGenerator
creates a random transformation internally and applies it directly to the image. That doesn't give us the chance to apply the same transformation to bounding boxes.The current workaround is to seed the global PRNG used by
ImageDataGenerator
with a random value and then have it transform first the image data, and then a mask for the bounding box (with the PRNG reset to the same random seed). The new bounding box is then determined from the transformed mask.This is a rather expensive process, but also slightly fragile to changes in the
ImageDataGenerator
. It relies on the fact thatImageDataGenerator
uses the global numpy PRNG, and that the number and order of generating random numbers is the same for the original image and the mask.In contrast, the new
random_transform_generator
doesn't apply any transformations, it just generates them. The responsibility for applying the transforms to the image and bounding box has been moved into our ownGenerator
with this PR.random_transform_generator
supports the following transformations:Notably, the one thing from
ImageDataGenerator.random_transform
it does not support is channel shifting, since that is not a homogeneous linear transformation in 2D.