This file documents a collection of baselines trained with pycls, primarily for the Designing Network Design Spaces paper. All configurations for these baselines are located in the configs/dds_baselines
directory. The tables below provide results and useful statistics about training and inference. Links to the pretrained models are provided as well. The following experimental and training settings are used for all of the training and inference runs.
- All baselines were run on Big Basin servers with 8 NVIDIA Tesla V100 GPUs (16GB GPU memory).
- All baselines were run using PyTorch 1.6, CUDA 9.2, and cuDNN 7.6.
- Inference times are reported for 64 images on 1 GPU for all models.
- Training times are reported for 100 epochs on 8 GPUs with the batch size listed.
- The reported errors are averaged across 5 reruns for robust estimates.
- The provided checkpoints are from the runs with errors closest to the average.
- All models and results below are on the ImageNet-1k dataset.
- The model id column is provided for ease of reference.
Our primary goal is to provide simple and strong baselines that are easy to reproduce. For all models, we use our basic training settings without any training enhancements (e.g., DropOut, DropConnect, AutoAugment, EMA, etc.) or testing enhancements (e.g., multi-crop, multi-scale, flipping, etc.); please see our Designing Network Design Spaces paper for more information.
- We use SGD with mometum of 0.9, a half-period cosine schedule, and train for 100 epochs.
- For ResNet/ResNeXt/RegNet, we use a reference learning rate of 0.1 and a weight decay of 5e-5 (see Figure 21).
- For EfficientNet, we use a reference learning rate of 0.2 and a weight decay of 1e-5 (see Figure 22).
- The actual learning rate for each model is computed as (batch-size / 128) * reference-lr.
- For training, we use aspect ratio, flipping, PCA, and per-channel mean and SD normalization.
- At test time, we rescale images to (256 / 224) * train-res and take the center crop of train-res.
- For ResNet/ResNeXt/RegNet, we use the image size of 224x224 for training.
- For EfficientNet, the training image size varies following the original paper.
For 8 GPU training, we apply 5 epoch gradual warmup, following the ImageNet in 1 Hour paper. Note that the learning rate scaling rule described above is similar to the one from the ImageNet in 1 Hour paper but the number of images per GPU varies among models. To understand how the configs are adjusted, please see the examples in the configs/lr_scaling
directory.
model | flops (B) |
params (M) |
acts (M) |
batch size |
infer (ms) |
train (hr) |
error (top-1) |
model id | download |
---|---|---|---|---|---|---|---|---|---|
RegNetX-200MF | 0.2 | 2.7 | 2.2 | 1024 | 10 | 2.8 | 31.1 | 160905981 | model |
RegNetX-400MF | 0.4 | 5.2 | 3.1 | 1024 | 15 | 3.9 | 27.3 | 160905967 | model |
RegNetX-600MF | 0.6 | 6.2 | 4.0 | 1024 | 17 | 4.4 | 25.9 | 160906442 | model |
RegNetX-800MF | 0.8 | 7.3 | 5.1 | 1024 | 21 | 5.7 | 24.8 | 160906036 | model |
RegNetX-1.6GF | 1.6 | 9.2 | 7.9 | 1024 | 33 | 8.7 | 23.0 | 160990626 | model |
RegNetX-3.2GF | 3.2 | 15.3 | 11.4 | 512 | 57 | 14.3 | 21.7 | 160906139 | model |
RegNetX-4.0GF | 4.0 | 22.1 | 12.2 | 512 | 69 | 17.1 | 21.4 | 160906383 | model |
RegNetX-6.4GF | 6.5 | 26.2 | 16.4 | 512 | 92 | 23.5 | 20.8 | 161116590 | model |
RegNetX-8.0GF | 8.0 | 39.6 | 14.1 | 512 | 94 | 22.6 | 20.7 | 161107726 | model |
RegNetX-12GF | 12.1 | 46.1 | 21.4 | 512 | 137 | 32.9 | 20.3 | 160906020 | model |
RegNetX-16GF | 15.9 | 54.3 | 25.5 | 512 | 168 | 39.7 | 20.0 | 158460855 | model |
RegNetX-32GF | 31.7 | 107.8 | 36.3 | 256 | 318 | 76.9 | 19.5 | 158188473 | model |
model | flops (B) |
params (M) |
acts (M) |
batch size |
infer (ms) |
train (hr) |
error (top-1) |
model id | download |
---|---|---|---|---|---|---|---|---|---|
RegNetY-200MF | 0.2 | 3.2 | 2.2 | 1024 | 11 | 3.1 | 29.6 | 176245422 | model |
RegNetY-400MF | 0.4 | 4.3 | 3.9 | 1024 | 19 | 5.1 | 25.9 | 160906449 | model |
RegNetY-600MF | 0.6 | 6.1 | 4.3 | 1024 | 19 | 5.2 | 24.5 | 160981443 | model |
RegNetY-800MF | 0.8 | 6.3 | 5.2 | 1024 | 22 | 6.0 | 23.7 | 160906567 | model |
RegNetY-1.6GF | 1.6 | 11.2 | 8.0 | 1024 | 39 | 10.1 | 22.0 | 160906681 | model |
RegNetY-3.2GF | 3.2 | 19.4 | 11.3 | 512 | 67 | 16.5 | 21.0 | 160906834 | model |
RegNetY-4.0GF | 4.0 | 20.6 | 12.3 | 512 | 68 | 16.8 | 20.6 | 160906838 | model |
RegNetY-6.4GF | 6.4 | 30.6 | 16.4 | 512 | 104 | 26.1 | 20.1 | 160907112 | model |
RegNetY-8.0GF | 8.0 | 39.2 | 18.0 | 512 | 113 | 28.1 | 20.1 | 161160905 | model |
RegNetY-12GF | 12.1 | 51.8 | 21.4 | 512 | 150 | 36.0 | 19.7 | 160907100 | model |
RegNetY-16GF | 15.9 | 83.6 | 23.0 | 512 | 189 | 45.6 | 19.6 | 161303400 | model |
RegNetY-32GF | 32.3 | 145.0 | 30.3 | 256 | 319 | 76.0 | 19.0 | 161277763 | model |
model | flops (B) |
params (M) |
acts (M) |
batch size |
infer (ms) |
train (hr) |
error (top-1) |
model id | download |
---|---|---|---|---|---|---|---|---|---|
ResNet-50 | 4.1 | 22.6 | 11.1 | 256 | 53 | 12.2 | 23.2 | 161235311 | model |
ResNet-101 | 7.8 | 44.6 | 16.2 | 256 | 90 | 20.4 | 21.4 | 161167170 | model |
ResNet-152 | 11.5 | 60.2 | 22.6 | 256 | 130 | 29.2 | 20.9 | 161167467 | model |
model | flops (B) |
params (M) |
acts (M) |
batch size |
infer (ms) |
train (hr) |
error (top-1) |
model id | download |
---|---|---|---|---|---|---|---|---|---|
ResNeXt-50 | 4.2 | 25.0 | 14.4 | 256 | 78 | 18.0 | 21.9 | 161167411 | model |
ResNeXt-101 | 8.0 | 44.2 | 21.2 | 256 | 137 | 31.8 | 20.7 | 161167590 | model |
ResNeXt-152 | 11.7 | 60.0 | 29.7 | 256 | 197 | 45.7 | 20.4 | 162471172 | model |
model | flops (B) |
params (M) |
acts (M) |
batch size |
infer (ms) |
train (hr) |
error (top-1) |
model id | download |
---|---|---|---|---|---|---|---|---|---|
EfficientNet-B0 | 0.4 | 5.3 | 6.7 | 256 | 34 | 11.7 | 24.9 | 161305613 | model |
EfficientNet-B1 | 0.7 | 7.8 | 10.9 | 256 | 52 | 15.6 | 24.1 | 161304979 | model |
EfficientNet-B2 | 1.0 | 9.2 | 13.8 | 256 | 68 | 18.4 | 23.4 | 161305015 | model |
EfficientNet-B3 | 1.8 | 12.0 | 23.8 | 256 | 114 | 32.1 | 22.5 | 161305060 | model |
EfficientNet-B4 | 4.2 | 19.0 | 48.5 | 128 | 240 | 65.1 | 21.2 | 161305098 | model |
EfficientNet-B5 | 9.9 | 30.0 | 98.9 | 64 | 504 | 135.1 | 21.5 | 161305138 | model |