Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective [arXiv]

Overview

This work presents a holistic study of the impact of architectural choice on adversarial robustness.

(Left) Impact of architectural components on adversarial robustness on CIFAR-10, relative to that of adversarial training methods. (Right) Progress of SotA robust accuracy against AutoAttack without additional data on CIFAR-10 with $ℓ_{\infty}$ perturbations of $ϵ = 8 / 255$ chronologically.

Impact of Block-level Design

The design of a block primarily comprises its topology, type of convolution and kernel size, choice of activation, and normalization. We examine these elements independently through controlled experiments and propose a novel residual block, dubbed RobustResBlock, based on our observations. An overview of RobustResBlock is provided below:

Table 1. White-box adversarial robustness of WRN with RobustResBlock

	$^{#} P$	$^{#} F$	${P G D}^{20}$	${C W}^{40}$
$D = 4$ , $W = 10$	39.6M	6.00G	57.70	54.71	[BaiduDisk]
$D = 5$ , $W = 12$	70.5M	10.6G	58.46	55.56	[BaiduDisk]
$D = 7$ , $W = 14$	133M	19.6G	59.41	56.62	[BaiduDisk]
$D = 11$ , $W = 16$	270M	39.3G	60.48	57.78	[BaiduDisk]

Impact of Network-level Design

Independent Scaling by Depth ( $D_{1}$ : $D_{2}$ : $D_{3}$ = $2$ : $2$ : $1$ )

We allow the depth of each stage ( $D_{i \in {1, 2, 3}}$ ) to vary among ${2, 3, 4, 5, 7, 9, 11}$ , details and pre-trained checkpoints of $7^{3} = 343$ depth settings are available from here.

Independent Scaling by Width ( $W_{1}$ : $W_{2}$ : $W_{3}$ = $2$ : $2.5$ : $1$ )

We allow the width (in terms of widening factors) of each stage ( $W_{i \in {1, 2, 3}}$ ) to vary among ${4, 6, 8, 10, 12, 14, 16, 20}$ , details and pre-trained checkpoints of $8^{3} = 512$ width settings are available from here.

Interplay between Depth and Width ( $\sum D_{i}$ : $\sum W_{i}$ = $7$ : $3$ )

Table 2. Performance of independent scaling ( $D$ or $W$ ) and compound scaling ( $D & W$ )

$^{#} F$ Target	Scale by	$D_{1}$	$W_{1}$	$D_{2}$	$W_{2}$	$D_{3}$	$W_{3}$	$^{#} P$	$^{#} F$	${P G D}^{20}$	${C W}^{40}$
	$D$	5	10	5	10	2	10	24.0M	5.25G	56.05	53.14	[BaiduDisk]
5G	$W$	4	11	4	13	4	6	24.5M	5.71G	56.89	53.87	[BaiduDisk]
	$D & W$	14	5	14	7	7	3	17.7M	5.09G	57.49	54.78	[BaiduDisk]
	$D$	6	12	6	12	3	12	48.5M	9.59G	56.42	53.91	[BaiduDisk]
10G	$W$	5	13	5	16	5	7	44.4M	10.5G	57.06	54.29	[BaiduDisk]
	$D & W$	17	7	17	9	8	4	39.3M	9.74G	58.06	55.45	[BaiduDisk]
	$D$	9	14	8	14	4	14	90.4M	18.6G	57.11	54.48	[BaiduDisk]
20G	$W$	7	16	7	18	7	8	81.7M	20.4G	58.02	55.34	[BaiduDisk]
	$D & W$	22	8	22	11	11	5	74.8M	20.3G	58.47	56.14	[BaiduDisk]
	$D$	14	16	13	16	11	16	185M	38.8G	57.90	55.79	[BaiduDisk]
40G	$W$	11	18	11	21	11	9	170M	42.7G	58.48	56.15	[BaiduDisk]
	$D & W$	27	10	28	14	13	6	147M	40.4G	58.76	56.59	[BaiduDisk]

Adversarially Robust Residual Networks (RobustResNets)

We use the proposed compound scaling rule to scale RobustResBlock and present a portfolio of adversarially robust residual networks.

Table 3. Comparison to SotA methods with additional 500K data

Method	Model	$^{#} P$	$^{#} F$	$A A$
RST	WRN-28-10	36.5M	5.20G	59.53
AWP	WRN-28-10	36.5M	5.20G	60.04
HAT	WRN-28-10	36.5M	5.20G	62.50
Gowal et al.	WRN-28-10	36.5M	5.20G	62.80
Huang el al.	WRN-34-R	68.1M	19.1G	62.54
Ours	RobustResNet-A1	19.2M	5.11G	63.70	[BaiduDisk]
Ours	WRN-A4	147M	40.4G	65.79	[BaiduDisk]

How to use

1. Use our RobustResNets

  from models.resnet import PreActResNet
  depth = [D1, D2, D3]
  channels = [16, 16*W1, 32*W2, 64*W3]
  block_types = ['robust_res_block', 'robust_res_block', 'robust_res_block']
  
  # Syntax
  model = PreActResNet(
    depth_configs=depth,
    channel_configs=channels,
    block_types=block_types,
    scales=8,
    base_width=10,
    cardinality=4,
    se_reduction=64
    num_classes=10,  # for CIFAR-10/SVHN/MNIST)
  
  # See Table 2 "D&W" rows for D1, D2, D3 and W1, W2, W3, see below for examples
  RobustResNet-A1 = PreActResNet(
    depth_configs=[14, 14, 7],
    channel_configs=[5, 7, 3],
    ...)
  RobustResNet-A2 = PreActResNet(
    depth_configs=[17, 17, 8],
    channel_configs=[7, 9, 4],
    ...)
  RobustResNet-A3 = PreActResNet(
    depth_configs=[22, 22, 11],
    channel_configs=[8, 11, 5],
    ...)
  RobustResNet-A4 = PreActResNet(
    depth_configs=[27, 28, 13],
    channel_configs=[10, 14, 6],
    ...)
  
  # If you prefer to use WRN's block but with our scalings
  WRN-A1 = PreActResNet(
    depth_configs=[14, 14, 7],
    channel_configs=[5, 7, 3],
    block_types = ['basic_block', 'basic_block', 'basic_block']
    ...)

2. Just want to use our block RobustResBlock

  from models.resnet import RobustResBlock
  # See Table 1 above for the performance of RobustResBlock
  block = RobustResBlock(
    in_chs, out_chs,
    kernel_size=3, 
    scales=8, 
    base_width=10, 
    cardinality=4,
    se_reduction=64,
    activation='ReLU', 
    normalization='BatchNorm')

3. Use our compound scaling rule, RobustScaling, to scale your custom models

Please see examples/compound_scaling.ipynb

How to evaluate pre-trained models

Download the checkpoints, which should contain the following:

arch_xxx/
  -arch_xxx.log  # training log
  -arch_xxx.yaml  # configuration file 
  -checkpoints/
    -arch_xxx.pth  # last epoch checkpoint
    -arch_xxx_best.pth  # checkpoint for best robust acc on valid set

Run the following lines to evaluate adversarial robustness

  python eval_robustness.py \
    --data "path to data" \
    --config_file_path "path to configuration yaml file" \
    --checkpoint_path "path to checkpoint pth file" \
    --save_path "path to file for logging evaluation" \
    --attack_choice [FGSM/PGD/CW/AA] \
    --num_steps [1/20/40/0] \
    --batch_size 100  # batch size for evaluation, adjust according to your GPU memory

CIFAR-10 (TRADES)

Model	$^{#} P$	$^{#} F$	Clean	${P G D}^{20}$	${C W}^{40}$	AA
WRN-28-10	36.5M	5.20G	84.62	55.90	53.15	51.66	[BaiduDisk]
RobNet-large-v2	33.3M	5.10G	84.57	52.79	48.94	47.48	[BaiduDisk]
AdvRush	32.6M	4.97G	84.95	56.99	53.27	52.90	[BaiduDisk]
RACL	32.5M	4.93G	83.91	55.98	53.22	51.37	[BaiduDisk]
RRN-A1 (ours)	19.2M	5.11G	85.46	58.47	55.72	54.42	[BaiduDisk]
WRN-34-12	66.5M	9.60G	84.93	56.01	53.53	51.97	[BaiduDisk]
WRN-34-R	68.1M	19.1G	85.80	57.35	54.77	53.23	[BaiduDisk]
RRN-A2 (ours)	39.0M	10.8G	85.80	59.72	56.74	55.49	[BaiduDisk]
WRN-46-14	128M	18.6G	85.22	56.37	54.19	52.63	[BaiduDisk]
RRN-A3 (ours)	75.9M	19.9G	86.79	60.10	57.29	55.84	[BaiduDisk]
WRN-70-16	267M	38.8G	85.51	56.78	54.52	52.80	[BaiduDisk]
RRN-A4 (ours)	147M	39.4G	87.10	60.26	57.90	56.29	[BaiduDisk]

CIFAR-100 (TRADES)

Model	$^{#} P$	$^{#} F$	Clean	${P G D}^{20}$	${C W}^{40}$	AA
WRN-28-10	36.5M	5.20G	56.30	29.91	26.22	25.26	[BaiduDisk]
RobNet-large-v2	33.3M	5.10G	55.27	29.23	24.63	23.69	[BaiduDisk]
AdvRush	32.6M	4.97G	56.40	30.40	26.16	25.27	[BaiduDisk]
RACL	32.5M	4.93G	56.09	30.38	26.65	25.65	[BaiduDisk]
RRN-A1 (ours)	19.2M	5.11G	59.34	32.70	27.76	26.75	[BaiduDisk]
WRN-34-12	66.5M	9.60G	56.08	29.87	26.51	25.47	[BaiduDisk]
WRN-34-R	68.1M	19.1G	58.78	31.17	27.33	26.31	[BaiduDisk]
RRN-A2 (ours)	39.0M	10.8G	59.38	33.00	28.71	27.68	[BaiduDisk]
WRN-46-14	128M	18.6G	56.78	30.03	27.27	26.28	[BaiduDisk]
RRN-A3 (ours)	75.9M	19.9G	60.16	33.59	29.58	28.48	[BaiduDisk]
WRN-70-16	267M	38.8G	56.93	29.76	27.20	26.12	[BaiduDisk]
RRN-A4 (ours)	147M	39.4G	61.66	34.25	30.04	29.00	[BaiduDisk]

CIFAR-10 (SAT)

Model	$^{#} P$	$^{#} F$	${P G D}^{20}$	${C W}^{40}$
WRN-28-10	36.5M	5.20G	52.44	50.97	[BaiduDisk]
RRN-A1 (ours)	19.2M	5.11G	57.62	56.06	[BaiduDisk]
WRN-34-12	66.5M	9.60G	52.85	51.36	[BaiduDisk]
RRN-A2 (ours)	39.0M	10.8G	58.39	56.99	[BaiduDisk]
WRN-46-14	128M	18.6G	53.67	52.95	[BaiduDisk]
RRN-A3 (ours)	75.9M	19.9G	58.81	57.60	[BaiduDisk]
WRN-70-16	267M	38.8G	54.12	50.52	[BaiduDisk]
RRN-A4 (ours)	147M	39.4G	59.01	57.85	[BaiduDisk]

CIFAR-10 (MART)

Model	$^{#} P$	$^{#} F$	${P G D}^{20}$	${C W}^{40}$
WRN-28-10	36.5M	5.20G	57.69	52.88	[BaiduDisk]
RRN-A1 (ours)	19.2M	5.11G	59.34	54.42	[BaiduDisk]
WRN-34-12	66.5M	9.60G	57.40	53.11	[BaiduDisk]
RRN-A2 (ours)	39.0M	10.8G	60.33	55.51	[BaiduDisk]
WRN-46-14	128M	18.6G	58.43	54.32	[BaiduDisk]
RRN-A3 (ours)	75.9M	19.9G	60.95	56.52	[BaiduDisk]
WRN-70-16	267M	38.8G	58.15	54.37	[BaiduDisk]
RRN-A4 (ours)	147M	39.4G	61.88	57.55	[BaiduDisk]

How to train

Baseline adversarial training

python -m torch.distributed.launch \
  --nproc_per_node=2 --master_port 24220 \  # use a random port number
  main_dist.py \
  --config_path ./configs/CIFAR10 \
  --exp_name ./exps/CIFAR10 \  # path to where you want to store training stats
  --version [WRN-A1/A2/A3/A4] \  # you may also change it to RobustResNet-A1/A2/A3/A4
  --train \ 
  --data_parallel \
  --apex-amp

Advanced adversarial training

Please download the additional pseudolabeled data from Carmon et al., 2019.

python -m torch.distributed.launch \
  --nproc_per_node=8 --master_port 14226 \  # use a random port number
  adv-main_dist.py \
  --log-dir ./checkpoints/ \  # path to where you want to store training stats
  --config-path ./configs/Advanced_CIFAR10
  --version [WRN-A1/A2/A3/A4] \ 
  --desc drna4-basic-silu-apex-500k \  # name of the folder for storing training stats
  --apex-amp --adv-eval-freq 5 \  # evaluation frequency, will significantly slow down your training if too often
  --start-eval 310 \  # start evaluating after N epochs
  --apex_amp --advnorm --adjust_bn True \
   --num-adv-epochs 400 --batch-size 1024 --lr 0.4 --weight-decay 0.0005 --beta 6.0 \
  --data-dir /datasets/ --data cifar10s \
  --aux-data-filename /datasets/ti_500K_pseudo_labeled.pickle \  # location to where you download the pseudolabeled data
  --unsup-fraction 0.7

Requirements

The code has been implemented and tested with Python 3.8.5, PyTorch 1.8.0, and apex(use for accel).

Part of the code is based on the following repos:

RobustWRN: https://github.com/HanxunH/RobustWRN
adversarial_robustness_pytorch: https://github.com/imrahulr/adversarial_robustness_pytorch
MART: https://github.com/YisenWang/MART
TREADES: https://github.com/yaodongyu/TRADES
RST: https://github.com/yaircarmon/semisup-adv
AutoAttack: https://github.com/fra31/auto-attack

Name	Name	Last commit message	Last commit date
Latest commit ShihuaHuang95 rename datas Jun 7, 2024 8c6d2c2 · Jun 7, 2024 History 16 Commits
adv_core	adv_core	rename datas	Jun 7, 2024
assets	assets	first commit	Dec 22, 2022
auto_attack	auto_attack	first commit	Dec 22, 2022
configs	configs	update configs	Aug 3, 2023
core	core	complete training setting	Aug 3, 2023
models	models	add support for advanced training checkpoints	Dec 28, 2022
.gitignore	.gitignore	update configs	Aug 3, 2023
LICENSE	LICENSE	first commit	Dec 22, 2022
README.md	README.md	Update README.md	Aug 3, 2023
adv-main_dist.py	adv-main_dist.py	revised the advance adversarial training codes, upload datas	Mar 24, 2024
eval_robustness.py	eval_robustness.py	add the deleted eval files	Jun 7, 2024
evaluator.py	evaluator.py	add the deleted eval files	Jun 7, 2024
main_dist.py	main_dist.py	complete training setting	Aug 3, 2023
requirements.txt	requirements.txt	complete training setting	Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective [arXiv]

Overview

Impact of Block-level Design

Table 1. White-box adversarial robustness of WRN with RobustResBlock

Impact of Network-level Design

Independent Scaling by Depth ( $D_{1}$ : $D_{2}$ : $D_{3}$ = $2$ : $2$ : $1$ )

Independent Scaling by Width ( $W_{1}$ : $W_{2}$ : $W_{3}$ = $2$ : $2.5$ : $1$ )

Interplay between Depth and Width ( $\sum D_{i}$ : $\sum W_{i}$ = $7$ : $3$ )

Table 2. Performance of independent scaling ( $D$ or $W$ ) and compound scaling ( $D & W$ )

Adversarially Robust Residual Networks (RobustResNets)

Table 3. Comparison to SotA methods with additional 500K data

How to use

1. Use our RobustResNets

2. Just want to use our block RobustResBlock

3. Use our compound scaling rule, RobustScaling, to scale your custom models

How to evaluate pre-trained models

CIFAR-10 (TRADES)

CIFAR-100 (TRADES)

CIFAR-10 (SAT)

CIFAR-10 (MART)

How to train

Baseline adversarial training

Advanced adversarial training

Requirements

Part of the code is based on the following repos:

About

Releases

Packages

Contributors 3

Languages

License

zhichao-lu/robust-residual-network

Folders and files

Latest commit

History

Repository files navigation

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective [arXiv]

Overview

Impact of Block-level Design

Table 1. White-box adversarial robustness of WRN with RobustResBlock

Impact of Network-level Design

Independent Scaling by Depth ( D 1 : D 2 : D 3 = 2 : 2 : 1 )

Independent Scaling by Width ( W 1 : W 2 : W 3 = 2 : 2.5 : 1 )

Interplay between Depth and Width ( ∑ D i : ∑ W i = 7 : 3 )

Table 2. Performance of independent scaling ( D or W ) and compound scaling ( D & W )

Adversarially Robust Residual Networks (RobustResNets)

Table 3. Comparison to SotA methods with additional 500K data

How to use

1. Use our RobustResNets

2. Just want to use our block RobustResBlock

3. Use our compound scaling rule, RobustScaling, to scale your custom models

How to evaluate pre-trained models

CIFAR-10 (TRADES)

CIFAR-100 (TRADES)

CIFAR-10 (SAT)

CIFAR-10 (MART)

How to train

Baseline adversarial training

Advanced adversarial training

Requirements

Part of the code is based on the following repos:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Independent Scaling by Depth ( $D_{1}$ : $D_{2}$ : $D_{3}$ = $2$ : $2$ : $1$ )

Independent Scaling by Width ( $W_{1}$ : $W_{2}$ : $W_{3}$ = $2$ : $2.5$ : $1$ )

Interplay between Depth and Width ( $\sum D_{i}$ : $\sum W_{i}$ = $7$ : $3$ )

Table 2. Performance of independent scaling ( $D$ or $W$ ) and compound scaling ( $D & W$ )

Packages