Skip to content

cvl-umass/AdaptCLIPZS

Repository files navigation

AdaptCLIPZS

This is the code-base for the 14 dataset benchmark for zero-shot classification proposed in

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Van Horn, Subhransu Maji

CVPR'24

visiongptmethod

Preparation

Create a conda environment with the specifications

conda env create -f environment.yml
conda activate adaptclipzs

Follow DATASETS.md of VDT-Adapter to download datasets and json files. Further download iNaturalist21, NABirds, CUB and Flowers102 from these specified links. Extract all images of CUB into a single folder by running:

cd <path to cub data>/images/ 
for folder in *; do; mv $folder/* ../images_extracted/.; done

Generate attributes from OpenAI GPT (optional)

We provide our generated attributes for all datasets in "gpt_descriptions" folder. The folder contains folders for every dataset named in the format <gpt_version>_<Dataset Name>. Each of the dataset folder contains text files for each class named after the classname. You can also reproduce the process by running

python generate_gpt.py --api_key <your_api_key> --dataset StanfordCars --location --im_dir <path to directory containing images of StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --gpt_version gpt4_0613_api

The above command will generate attributes for the StanfordCars dataset. The same command can be used to generate descriptions for all 14 datasets by changing the dataset, im_dir and json_file arguments. You do not need to provide json_file for CUB, NABirds and iNaturalist datasets. the location argument indicicates whether you want to generate attributes pertaining to where a certain category is found. We use this for natural domains in the paper i.e. CUB, NABirds. iNaturalist21 and Flowers102.

This will save the attributes in a folders named <gpt_version>_<dataset> inside AdaptCLIPZS.

Fine-tuning CLIP

For non-natural domains run

python finetune_clip.py --dataset StanfordCars --im_dir <path to directory containing StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --fewshot --arch ViT-B/16 --save_dir ./ft_clip_cars --text_dir ./gpt4_0613_api_StanfordCars

For natural domains i.e. CUB, iNaturalist, Flowers102 and NABirds run

python finetune_clip_nat.py --dataset CUB --im_dir <path to directory containing CUB> --fewshot --arch ViT-B/16 --save_dir ./ft_clip_cub --text_dir_viz ./gpt4_0613_api_CUB --text_dir_loc ./gpt4_0613_api_CUB_location

The fewshot argument indicates whether you want use 16 images per class for training or the whole dataset. You can also specify hyperparmeters including main_lr, main_wd, proj_lr, proj_wd, tau.

Testing

Following command performs evaluation for CLIPFT+A setup

python test_AdaptZS.py --dataset StanfordCars --im_dir <path to directory containing StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --arch ViT-B/16 --ckpt_path <path to fine-tuned checkpoints> --text_dir ./gpt4_0613_api_StanfordCars --attributes

For testing vanilla CLIP add --vanillaCLIP argument and for testing without GPT attributes omit --attributes. For natural domains also provide path to location attributes in text_dir_loc argument.

Note: For CUB, Flowers, NABirds, INat, and ImageNet im_dir has to be set as the path to the dataset folder, while for the rest it should be set as the directory containing the dataset folder.

Pre-trained Checkpoints

We provide pre-trained checkpoints for iNaturist21, NABirds and CUB datasets for both ViT-B/16 and ViT-B/32 architectures, which can be downloaded here.

You can run the following command with pre-trained checkpoints to reproduce performance testing on CUB dataset.

python test_AdaptZS.py --im_dir <path to directory containing CUB> --ckpt_path ./INaturalist21_b16.pth --text_dir ./gpt_descriptions/gpt4_0613_api_CUB/ --text_dir_loc ./gpt_descriptions/gpt4_0613_api_CUB_location/ --arch ViT-B/16 --attributes

You can modify the --ckpt_path with any of the other checkpoints making sure you provide the corresponding architecture in --arch. Following table shows the accuracies for the various checkpoints.

Model Accuracy
INaturalist21_b32.pth 54.54
INaturalist21_b16.pth 56.76
NABirds_b32.pth 55.46
NABirds_b16.pth 56.59
CUB_b32.pth 54.23
CUB_b16.pth 56.01

Citation

If you find our work useful, please consider citing:

@inproceedings{saha2024improved,
  title={Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions},
  author={Saha, Oindrila and Van Horn, Grant and Maji, Subhransu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={17542--17552},
  year={2024}
}

Thanks to CoOP and VDT-Adapter for releasing the code base which our code is built upon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published