Why do we need this code snippet for training MAE? #44

redagavin · 2024-08-25T23:19:20Z

Hi,
could you please explain why do we need this code snippet when training MAE? Why is apply_patch from timm.py not enough?
Thank you!

dbolya · 2024-08-26T00:20:34Z

MAE models are trained with global average pooling at the end instead of a class token. Since we're changing the size of each token by merging them together, we need to perform this global average pool with a weight proportional to the size of each token.

This is also what merge_wavg does when merging tokens together (just this time it's global).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we need this code snippet for training MAE? #44

Why do we need this code snippet for training MAE? #44

redagavin commented Aug 25, 2024

dbolya commented Aug 26, 2024

Why do we need this code snippet for training MAE? #44

Why do we need this code snippet for training MAE? #44

Comments

redagavin commented Aug 25, 2024

dbolya commented Aug 26, 2024