I use the dataset originally from “Happywhale - Whale and Dolphin Identification” Kaggle competition. For a more structured and easy-to-implement data, I found the resized whale-dolphin images dataset (contributed by “RDIZZL3”). The contributor resized the images to 128×128. It contains 3k training images and 1k testing images with two species: bottlenose dolphin and killer whale.
@article{dosovitskiy2020vit,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}