Skip to content

Latest commit

 

History

History
120 lines (84 loc) · 7.72 KB

README.md

File metadata and controls

120 lines (84 loc) · 7.72 KB

Learning to Generate Noise for Multi-Attack Robustness

This is the Pytorch Implementation for the paper Learning to Generate Noise for Multi-Attack Robustness

Authors: Divyam Madaan, Jinwoo Shin, Sung Ju Hwang

Abstract

Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations. However, the majority of existing defense methods are tailored to defend against a single category of adversarial perturbation (e.g. $\ell_\infty$-attack). In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system. Moreover, training on multiple perturbations simultaneously significantly increases the computational overhead during training. To address these challenges, we propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks. Its key component is Meta Noise Generator (MNG) that outputs optimal noise to stochastically perturb a given sample, such that it helps lower the error on diverse adversarial perturbations. By utilizing samples generated by MNG, we train a model by enforcing the label consistency across multiple perturbations. We validate the robustness of models trained by our scheme on various datasets and against a wide variety of perturbations, demonstrating that it significantly outperforms the baselines across multiple perturbations with a marginal computational cost.

Contribution of this work

  • We introduce Adversarial Consistency (AC) loss that enforces label consistency across multiple perturbations to enforce smooth and robust networks.
  • We formulate Meta-Noise Generator (MNG) that explicitly meta-learns an input-dependent noise generator, such that it outputs stochastic noise distribution to improve the model's robustness and adversarial consistency across multiple types of adversarial perturbations.
  • We validate our proposed method on various datasets against diverse benchmark adversarial attacks, on which it achieves state-of-the-art performance, highlighting its practical impact.

Prerequisites

$ pip install -r requirements.txt

For RST, the data can be obtained from here

Run

  1. CIFAR-10 experiment

# Meta Noise Generator with Adversarial Consistency and RST
$ python train_MNG.py --fname MNG_cifar10 --dataset cifar10 --model WideResNet --rst=True

# Meta Noise Generator with Adversarial Consistency
$ python train_MNG.py --fname MNG_cifar10 --dataset cifar10 --model WideResNet 

# Stochastic Adversarial Training
$ python train_pgd.py --fname MNG_cifar10 --dataset cifar10 --model WideResNet --attack_type random

# Evaluation
## PGD attacks
$ python evaluate.py --fname MNG_cifar10 --dataset cifar10 --model WideResNet --attack_lib custom --norm linf

## Foolbox attacks
$ python evaluate.py --fname MNG_cifar10 --dataset cifar10 --model WideResNet --attack_lib foolbox

## Autoattack attacks
$ python evaluate.py --fname MNG_cifar10 --dataset cifar10 --model WideResNet --attack_lib autoattack --norm linf
  1. SVHN experiment

# Meta Noise Generator with Adversarial Consistency and RST
$ python train_MNG.py --fname MNG_svhn --dataset svhn --model WideResNet --rst=True

# Meta Noise Generator with Adversarial Consistency
$ python train_MNG.py --fname MNG_svhn --dataset svhn --model WideResNet 

# Stochastic Adversarial Training
$ python train_pgd.py --fname MNG_svhn --dataset svhn --model WideResNet --attack_type random

# Evaluation
## PGD attacks
$ python evaluate.py --fname MNG_svhn --dataset svhn --model WideResNet --attack_lib custom --norm linf

## Foolbox attacks
$ python evaluate.py --fname MNG_svhn --dataset svhn --model WideResNet --attack_lib foolbox

## Autoattack attacks
$ python evaluate.py --fname MNG_svhn --dataset svhn --model WideResNet --attack_lib autoattack --norm linf
  1. Tiny-ImageNet experiment

# Meta Noise Generator with Adversarial Consistency
$ python train_MNG.py --fname MNG_tinyimagenet --dataset tinyimagenet --model resnet50


# Stochastic Adversarial Training
$ python train_pgd.py --fname MNG_tinyimagenet --dataset tinyimagenet --model resnet50 --attack_type random

# Evaluation
## PGD attacks
$ python evaluate.py --fname MNG_tinyimagenet --dataset tinyimagenet --model resnet50 --attack_lib custom --norm linf

## Foolbox attacks
$ python evaluate.py --fname MNG_tinyimagenet --dataset tinyimagenet --model resnet50 --attack_lib foolbox

## Autoattack attacks
$ python evaluate.py --fname MNG_tinyimagenet --dataset tinyimagenet --model resnet50 --attack_lib autoattack --norm linf

Pretrained models

Dataset Architecture Average Max MSD MNG-AC MNG-AC + RST
CIFAR10 WideResNet 28-10 ckpt ckpt ckpt ckpt ckpt
SVHN WideResNet 28-10 ckpt ckpt ckpt ckpt ckpt
Tiny-ImageNet ResNet50 ckpt ckpt ckpt ckpt

Contributing

We'd love to accept your contributions to this project. Please feel free to open an issue, or submit a pull request as necessary. If you have implementations of this repository in other ML frameworks, please reach out so we may highlight them here.

Acknowledgment

The code is build upon locuslab/fast_adversarial and locuslab/robust_union

Citation

If you found the provided code useful, please cite our work.

@inproceedings{
    madaan2021learning,
    title={Learning to Generate Noise for Multi-Attack Robustness},
    author={Divyam Madaan and Jinwoo Shin and Sung Ju Hwang},
    booktitle={International Conference on Machine Learning},
    year={2021}
    url = "https://arxiv.org/abs/2006.12135"
}