Understanding Defensive Strategies for Adversarial Attacks on Large Vision Language Models (AC297R Capstone Project)
This repository sources from https://github.com/euanong/image-hijacks and is used to generate adversarial image attacks over LLaVA-7b-chat model.
The code can be run under any environment with Python 3.9 and above.
We use poetry for dependency management, which can be installed following the instructions here.
To build a virtual environment with the required packages, simply run
curl -sSL https://install.python-poetry.org | python3 -
poetry install
To effectively run the code in this repository, the following resource configurations are recommended:
- Disk Storage: At least 35GB of available disk storage is required.
- GPU RAM: A minimum of 30GB GPU RAM is necessary. A GPU equivalent to or better than NVIDIA L40 should be sufficient.
We show step-by-step how to generate adversarial images on LLaVA-7b-chat model. The pretrained lora weights can be downloaded from this Hugging Face repo. The default setting would be training the adversaries by adding noise to the whole patch by epsilon constraint.
To train these images, first download the LLaVA checkpoint:
poetry run python download.py models llava-llama-2-7b-chat
To get the list of jobs (with their job IDs) specified by this config file:
poetry run python experiments/exp_demo_imgs/config.py
To run job ID N
without wandb logging:
# run w/o wandb
poetry run python run.py train \
--config_path experiments/exp_demo_imgs/config.py \
--log_dir experiments/exp_demo_imgs/logs \
--job_id N \
--playground
To run job ID N
with wandb logging to YOUR_WANDB_ENTITY/YOUR_WANDB_PROJECT
:
# log in HF using API key
pip install transformers[cli]
huggingface-cli login
# run w/ wandb
poetry run python run.py train \
--config_path experiments/exp_demo_imgs/config.py \
--log_dir experiments/exp_demo_imgs/logs \
--job_id N \
--wandb_entity YOUR_WANDB_ENTITY \
--wandb_project YOUR_WANDB_PROJECT \
--no-playground
To train the adversaries by static and moving patches, we simply need to change the following function sweep_patches.
def sweep_patches(cur_keys: List[str]) -> List[Transform]:
return [
Transform(
[
cfg.proc_learnable_image,
lambda c: cfg.set_input_image(c, EIFFEL_IMAGE),
],
"pat_full",
)
]
Here, we can change cfg.proc_learnable_image
to cfg.proc_patch_static
for static patches or cfg.proc_patch_random_loc
for moving patches. These functions can be found in code config.py.
To apply defense mechanisms to adversarial images, navigate to the notebooks/image_defense.ipynb
notebook. There is an option to choose between six defenses:
- Rescaling / Resizing
- JPEG Compression
- Cropping
- Gaussian Noise
- Color Bit Depth Reduction
- Total Variation Denoising
Follow the notebook comments to generate defense images.
For better interpretability, visualizations are created to examine the locations of clean, adversarial, and defense images in the embedding space as well as highlighting important regions of the images. Three types of visualizations are generated:
- PCA Plots of Clean, Adversarial, and Defense Image Embeddings (in the
notebooks/image_embedding_7b.ipynb
notebook) - Histograms of L2-Norm and Cosine Similarities of Clean, Adversarial, and Defense Image Embeddings (in the
notebooks/image_embedding_7b.ipynb
notebook) - Saliency Maps of Clean and Adversarial Images (in the
notebooks/saliency-map-experiment.ipynb
notebook)
Follow the notebook comments to create relevant visualizations.
@misc{bailey2023image,
title={Image Hijacks: Adversarial Images can Control Generative Models at Runtime},
author={Luke Bailey and Euan Ong and Stuart Russell and Scott Emmons},
year={2023},
eprint={2309.00236},
archivePrefix={arXiv},
primaryClass={cs.LG}
}