Understanding Defensive Strategies for Adversarial Attacks on Large Vision Language Models (AC297R Capstone Project)

This repository sources from https://github.com/euanong/image-hijacks and is used to generate adversarial image attacks over LLaVA-7b-chat model.

Setup

The code can be run under any environment with Python 3.9 and above.

We use poetry for dependency management, which can be installed following the instructions here.

To build a virtual environment with the required packages, simply run

curl -sSL https://install.python-poetry.org | python3 -
poetry install

To effectively run the code in this repository, the following resource configurations are recommended:

Disk Storage: At least 35GB of available disk storage is required.
GPU RAM: A minimum of 30GB GPU RAM is necessary. A GPU equivalent to or better than NVIDIA L40 should be sufficient.

Train adversarial images for LLaVA-7b-chat by full patch noise perturbation

We show step-by-step how to generate adversarial images on LLaVA-7b-chat model. The pretrained lora weights can be downloaded from this Hugging Face repo. The default setting would be training the adversaries by adding noise to the whole patch by epsilon constraint.

To train these images, first download the LLaVA checkpoint:

poetry run python download.py models llava-llama-2-7b-chat

To get the list of jobs (with their job IDs) specified by this config file:

poetry run python experiments/exp_demo_imgs/config.py

To run job ID N without wandb logging:

# run w/o wandb
poetry run python run.py train \
--config_path experiments/exp_demo_imgs/config.py \
--log_dir experiments/exp_demo_imgs/logs \
--job_id N \
--playground

To run job ID N with wandb logging to YOUR_WANDB_ENTITY/YOUR_WANDB_PROJECT:

# log in HF using API key
pip install transformers[cli]
huggingface-cli login

# run w/ wandb
poetry run python run.py train \
--config_path experiments/exp_demo_imgs/config.py \
--log_dir experiments/exp_demo_imgs/logs \
--job_id N \
--wandb_entity YOUR_WANDB_ENTITY \
--wandb_project YOUR_WANDB_PROJECT \
--no-playground

Train adversarial images for LLaVA-7b-chat by static and moving patches

To train the adversaries by static and moving patches, we simply need to change the following function sweep_patches.

def sweep_patches(cur_keys: List[str]) -> List[Transform]:
    return [
        Transform(
            [
                cfg.proc_learnable_image,
                lambda c: cfg.set_input_image(c, EIFFEL_IMAGE),
            ],
            "pat_full",
        )
    ]

Here, we can change cfg.proc_learnable_image to cfg.proc_patch_static for static patches or cfg.proc_patch_random_loc for moving patches. These functions can be found in code config.py.

Generate Defense Images

To apply defense mechanisms to adversarial images, navigate to the notebooks/image_defense.ipynb notebook. There is an option to choose between six defenses:

Rescaling / Resizing
JPEG Compression
Cropping
Gaussian Noise
Color Bit Depth Reduction
Total Variation Denoising

Follow the notebook comments to generate defense images.

Visualizations For Better Interpretability

For better interpretability, visualizations are created to examine the locations of clean, adversarial, and defense images in the embedding space as well as highlighting important regions of the images. Three types of visualizations are generated:

PCA Plots of Clean, Adversarial, and Defense Image Embeddings (in the notebooks/image_embedding_7b.ipynb notebook)
Histograms of L2-Norm and Cosine Similarities of Clean, Adversarial, and Defense Image Embeddings (in the notebooks/image_embedding_7b.ipynb notebook)
Saliency Maps of Clean and Adversarial Images (in the notebooks/saliency-map-experiment.ipynb notebook)

Follow the notebook comments to create relevant visualizations.

Reference

@misc{bailey2023image,
  title={Image Hijacks: Adversarial Images can Control Generative Models at Runtime}, 
  author={Luke Bailey and Euan Ong and Stuart Russell and Scott Emmons},
  year={2023},
  eprint={2309.00236},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
experiments/exp_demo_imgs		experiments/exp_demo_imgs
image_hijacks		image_hijacks
notebooks		notebooks
reports		reports
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.py		download.py
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Understanding Defensive Strategies for Adversarial Attacks on Large Vision Language Models (AC297R Capstone Project)

Setup

Train adversarial images for LLaVA-7b-chat by full patch noise perturbation

Train adversarial images for LLaVA-7b-chat by static and moving patches

Generate Defense Images

Visualizations For Better Interpretability

Reference

About

Releases

Packages

Contributors 3

Languages

License

leocheung1001/image-hijacks-capstone

Folders and files

Latest commit

History

Repository files navigation

Understanding Defensive Strategies for Adversarial Attacks on Large Vision Language Models (AC297R Capstone Project)

Setup

Train adversarial images for LLaVA-7b-chat by full patch noise perturbation

Train adversarial images for LLaVA-7b-chat by static and moving patches

Generate Defense Images

Visualizations For Better Interpretability

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages