-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds efficientnet2 presets #1983
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Piseth! Left a few comments
batch_norm_epsilon=1e-3, | ||
activation="swish", | ||
dropout=0.2, | ||
nores=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add missing args in docstring - example : nores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,136 @@ | |||
import keras |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename file to be more readable - convolution_batch_norm_activation.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this conflicts w/ the below review, so I chose the later review.
@@ -0,0 +1,22 @@ | |||
import keras |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename file to be more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this conflicts w/ the below review, so I chose the later review.
@@ -77,6 +77,7 @@ def __init__( | |||
activation="swish", | |||
dropout=0.2, | |||
nores=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring args are not matching the arg list here - please make sure they are consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a partially "stackwise" arg that isn't name "stackwise_"
is confusing. Left a suggestion to be more consistent in our arg names here. Less if branching.
} | ||
|
||
|
||
class ConvBNActBlock(keras.layers.Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just call this CBABlock
? And rename the file to match?
Kind of confusing the different forms of abbreviation we are using here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return x | ||
|
||
def get_config(self): | ||
config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't do this style generally. follow this this https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/models/bert/bert_backbone.py#L200-L213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -260,11 +271,26 @@ def __init__( | |||
name=block_name, | |||
) | |||
x = block(x) | |||
else: # cba block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this? you extend get_conv_constructor
to handle "cba"
then don't use it. I'm confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing my mind mid-change 😄. Refactored to make block_kwargs more dynamic now.
for i in range(len(stackwise_kernel_sizes)): | ||
num_stacks = len(stackwise_kernel_sizes) | ||
|
||
if isinstance(depth_coefficient, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing, most that can be passed "stackwise" have stackwise_
as a prefix, except depth_coefficient
now.
It's probably better UX to just rename these first arguments stackwise_depth_coefficient
and stackwise_width_coefficient
. At least then our argument names are consistent. Remember to update docstrings.
You could keep backwards compat by adding a few lines at the top of the constructor here. Something like this. You have to allow stackwise_depth_coefficient=None
and stackwise_width_coefficient=None
in the arg list.
num_stacks = len(stackwise_kernel_sizes)
if "depth_coefficient" in kwargs:
stackwise_depth_coefficient = kwargs.pop("depth_coefficient") * num_stacks
if "width_coefficient" in kwargs:
stackwise_width_coefficient = kwargs.pop("width_coefficient") * num_stacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -189,15 +236,22 @@ def port_batch_normalization(keras_layer, hf_weight_prefix): | |||
|
|||
# Stages | |||
num_stacks = len(backbone.stackwise_kernel_sizes) | |||
|
|||
depth_coefficient = VARIANT_MAP[variant]["depth_coefficient"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we follow the suggestion about arg renaming above, we can get rid of this whole block. Just update above from
"width_coefficient": 0.8,
"depth_coefficient": 0.9,
to
"stackwise_width_coefficient": [0.8] * 6,
"stackwise_depth_coefficient": [0.9] * 6,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it so both ways should work.
I should note I'm noticing a different prediction for one variant: python tools/checkpoint_conversion/convert_efficientnet_checkpoints.py --preset efficientnet2_rw_s_ra2_imagenet
✅ Loaded TIMM model.
I1121 13:45:18.860764 8329934912 _builder.py:196] Loading pretrained weights from Hugging Face hub (timm/efficientnetv2_rw_s.ra2_in1k)
I1121 13:45:19.120539 8329934912 _hub.py:184] [timm/efficientnetv2_rw_s.ra2_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
✅ Loaded KerasHub model.
1/1 ━━━━━━━━━━━━━━━━━━━━ 2s 2s/step
🔶 Keras output: [-0.575618 -0.4144789 -0.73163635 -0.8395867 -0.5637497 0.87436163
-1.3687949 -0.1337489 -0.0662252 0.01332632]
🔶 TIMM output: [-0.7259099 0.3929606 -0.05580516 0.11954936 0.02080455 0.996287
0.3669875 0.5366461 0.01827425 -0.37024903]
🔶 Keras label: 349
🔶 TIMM label: 345
🔶 Modeling difference: 0.52596486
🔶 Preprocessing difference: 0.0044535464
🏁 Preset saved to ./efficientnet2_rw_s_ra2_imagenet Which I'm looking into |
@pkgoogle thanks! Yeah a different label and that level of different output floats worth looking at! |
Should be good now: python tools/checkpoint_conversion/convert_efficientnet_checkpoints.py --preset efficientnet2_rw_s_ra2_imagenet
✅ Loaded TIMM model.
I1125 14:46:30.043703 8329934912 _builder.py:196] Loading pretrained weights from Hugging Face hub (timm/efficientnetv2_rw_s.ra2_in1k)
I1125 14:46:30.252438 8329934912 _hub.py:184] [timm/efficientnetv2_rw_s.ra2_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
✅ Loaded KerasHub model.
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step
🔶 Keras output: [-0.37625 0.6832866 0.28495154 0.5878901 0.20605645 1.0627874
0.40687525 0.40137056 -0.118939 -0.30322495]
🔶 TIMM output: [-0.7259099 0.3929606 -0.05580516 0.11954936 0.02080455 0.996287
0.3669875 0.5366461 0.01827425 -0.37024903]
🔶 Keras label: 345
🔶 TIMM label: 345
🔶 Modeling difference: 0.15455398
🔶 Preprocessing difference: 0.0044535464
🏁 Preset saved to ./efficientnet2_rw_s_ra2_imagenet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just little comments.
|
||
BN_AXIS = 3 | ||
|
||
CONV_KERNEL_INITIALIZER = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda funky to return this as a dict, any particular reason for it?
More in line with other models would be to just add a function here...
def conv_kernel_initializer(scale=2.):
return keras.initializers.VarianceScaling(
scale=scale,
...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, I was following the original style of the fusedmbconv/mbconv blocks. Done.
@@ -99,8 +106,8 @@ class EfficientNetBackbone(FeaturePyramidBackbone): | |||
def __init__( | |||
self, | |||
*, | |||
width_coefficient, | |||
depth_coefficient, | |||
stackwise_width_coefficient=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably make these plural, in keeping with other args? stackwise_width_coefficients and stackwise_depth_coefficients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"num_features": 1792, | ||
}, | ||
"rw_s": { | ||
"width_coefficient": 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd pass these the new way instead of the legacy and now undocumented path. "stackwise...": [1.0] * 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}, | ||
"rw_s": { | ||
"width_coefficient": 1.0, | ||
"depth_coefficient": 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, and elsewhere in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
activation: activation function to use between each convolutional layer. | ||
input_shape: optional shape tuple, it should have exactly 3 input | ||
channels. | ||
stackwise_width_coefficient: list[float] or float, scaling coefficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this need to be a list[float]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, done
Adds these variants:
Branched from Edge Presets: #1976
Can just merge this one and close that one or merge that one first for more modularity.