Fine-grained Species Recognition with Privileged Pooling: Better Sample Efficiency Through Supervised Attention
Official implementation of our TPAMI 2023 paper (open access) , arxiv.
We propose a scheme for supervised image classification that uses privileged information, in the form of keypoint annotations for the training data, to learn strong models from small and/or biased training sets
Predicted classes using privileged pooling on CCT20-Cis test dataset (top) and iBirds2017 test dataset (bottom). Bounding boxes are computed using the predicted attention maps. Attention maps (bounding-box cropped for visualization) depict the encoded privileged information from different keypoints provided at train time. The bottom right-most attention map is not supervised by any keypoint and acts as complementary to other animal regions.
Dataset | Training set | Privileged Information ( |
Test set |
---|---|---|---|
CUB200-2011 | 5,994 training samples from CUB200-2011 | Keypoint annotations for all training samples from the dataset authors | - 5,794 test samples from CUB200-2011 - iBirds2017: Subbest of iNaturalist2017 with the test samples from species existing in CUB200-2011. See the iNaturlist2017 - CUB200 taxonomy mapping in data/classes_taxonomy_CUB.csv
|
CCT20 | 57,000 training samples from CameraTrapDataset20 |
CameraTrapDataset20+: Keypoint annotations for 1,182 training samples data/keypoints_cct20plus.csv
|
15,000 Cis and 23,000 Trans test samples from CCT20 |
iBirds2018 | 143,950 training samples from the Aves supercategory from iNaturalist2018 |
iBirds2018+: Keypoint annotations for 1,014 training samples data/keypoints_inat2018.csv
|
3744 test samples from the Aves supercategory |
- Prepare the data:
- Update the function get_basepath to your data folder. it defaults to
/scratch/
. - Download CUB200-2011 in the corresponding folder
- for keypoint annotations we used the function
annotate_keypoints
scripts indata/DATASET.py
- Install the required packages
pip3 install -r requirements.txt
- Training
To train a model a first order pooling simply run:
python train_img_class.py --dataset=inat2018 --model attention_map
# CUB200 with inat2017 as test set
python train_img_class.py --dataset=birds --model attention_map
python train_img_class.py --dataset=cct --model attention_map
We used an NVIDIA Titan X for all our experiments.
- Inference
python train_img_class.py --test-only --datset inat2018 --model attention_map --resume SAVE_DIR/checkpoint.pth.tar
Privileged Pooling (PrPool) illustration. M attention maps with K supervised and Q complementary ones.
Top-1 accuracy for CUB200 (left) and iBirds2017, (right, CUB species subset of iNaturalist2017) test datasets. Mean accuracy and standard deviation (error bars) over 5 runs. In gray the baselines methods AvgPool and CovPool; green indicates our methods trained with PrPool (ours). Bars with ? indicate use of privileged information at training time (x? methods). LSTM values are taken directly from [53], TransFG-21k uses [54] pretrained on ImageNet-21k and TransFG-1k is pretrained on ImageNet-1k.
Mean per class accuracy and precision for iBirds2018 (Bird species subset of iNaturalist2018). Results are grouped in sub-sets according to the number of available training samples for each class.