Currently, we support various popular generative models, including unconditional GANs, image translation models, and internal GANs. Meanwhile, our framework has been tested on multiple standard datasets, e.g., FFHQ, CelebA, and LSUN. This note will show how to perform common tasks on these existing models and standard datasets, including:
- Use existing models to generate random samples
- Test existing models on standard datasets.
- Train predefined models on standard datasets.
In this section, we will specify how to sample fake images by using our unconditional GANs and image translation models. For model inference, all of the APIs have been included in mmgen/apis/inference.py. The most important function is init_model
for creating a generative model from a config. Then, adopting the sampling function in this file with the generative model will offer you the synthesized images.
MMGeneration provides high-level APIs for sampling images with unconditional GANs. Here is an example for building StyleGAN2-256 and obtaining the synthesized images.
import mmcv
from mmgen.apis import init_model, sample_uncoditional_model
# Specify the path to model config and checkpoint file
config_file = 'configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py'
# you can download this checkpoint in advance and use a local file path.
checkpoint_file = 'https://download.openmmlab.com/mmgen/stylegan2/official_weights/stylegan2-church-config-f-official_20210327_172657-1d42b7d1.pth'
device = 'cuda:0'
# init a generatvie
model = init_model(config_file, checkpoint_file, device=device)
# sample images
fake_imgs = sample_unconditional_model(model, 4)
Indeed, we have already provided a more friendly demo script to users. You can use demo/unconditional_demo.py with the following commands:
python demo/unconditional_demo.py \
${CONFIG_FILE} \
${CHECKPOINT} \
[--save-path ${SAVE_PATH}] \
[--device ${GPU_ID}]
Note that more arguments are also offered to customizing your sampling procedure. Please use python demo/unconditional_demo.py --help
to check more details.
MMGeneration provides high-level APIs for sampling images with conditional GANs. Here is an example for building SAGAN-128 and obtaining the synthesized images.
import mmcv
from mmgen.apis import init_model, sample_conditional_model
# Specify the path to model config and checkpoint file
config_file = 'configs/sagan/sagan_128_woReLUinplace_noaug_bigGAN_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b32x8.py'
# you can download this checkpoint in advance and use a local file path.
checkpoint_file = 'https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth'
device = 'cuda:0'
# init a generatvie
model = init_model(config_file, checkpoint_file, device=device)
# sample images with random label
fake_imgs = sample_conditional_model(model, 4)
# sample images with the same label
fake_imgs = sample_conditional_model(model, 4, label=0)
# sample images with specific labels
fake_imgs = sample_conditional_model(model, 4, label=[0, 1, 2, 3])
Indeed, we have already provided a more friendly demo script to users. You can use demo/conditional_demo.py with the following commands:
python demo/conditional_demo.py \
${CONFIG_FILE} \
${CHECKPOINT} \
[--label] ${LABEL} \
[--samples-per-classes] ${SAMPLES_PER_CLASSES} \
[--sample-all-classes] \
[--save-path ${SAVE_PATH}] \
[--device ${GPU_ID}]
If --label
is not passed, images with random labels would be generated.
If --label
is passed, we would generate ${SAMPLES_PER_CLASSES}
images for each input label.
If sample_all_classes
is set true in command line, --label
would be ignored and the generator will output images for all categories.
Note that more arguments are also offered to customizing your sampling procedure. Please use python demo/conditional_demo.py --help
to check more details.
MMGeneration provides high-level APIs for translating images by using image translation models. Here is an example of building Pix2Pix and obtaining the translated images.
import mmcv
from mmgen.apis import init_model, sample_img2img_model
# Specify the path to model config and checkpoint file
config_file = 'configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py'
# you can download this checkpoint in advance and use a local file path.
checkpoint_file = 'https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth?versionId=CAEQMhiBgIC57vTj3RciIGZlNmQ4ZDJhN2E1MDQ5ZmJiOWJmYTY5MDg1ZTc0N2Vi'
# Specify the path to image you want to translate
image_path = 'tests/data/paired/test/33_AB.jpg'
device = 'cuda:0'
# init a generatvie
model = init_model(config_file, checkpoint_file, device=device)
# translate a single image
translated_image = sample_img2img_model(model, image_path, target_domain='photo')
Indeed, we have already provided a more friendly demo script to users. You can use demo/translation_demo.py with the following commands:
python demo/translation_demo.py \
${CONFIG_FILE} \
${CHECKPOINT} \
${IMAGE_PATH}
[--save-path ${SAVE_PATH}] \
[--device ${GPU_ID}]
Note that more customized arguments are also offered to customizing your sampling procedure. Please use python demo/translation_demo.py --help
to check more details.
This section details how to prepare the dataset for MMGeneration and provides a standard way which we have used in our default configs. We recommend that all of the users may follow the following steps to organize their datasets.
It's much easier to prepare dataset for unconditional models. Firstly, please make a directory, named data
, in the MMGeneration project. After that, all of datasets can be used by adopting the technology of symlink (soft link).
mkdir data
ln -s absolute_path_to_dataset ./data/dataset_name
Since unconditional models only need real images for training and testing, all you need to do is link your dataset to the data
directory. Our dataset will automatically check all of the images in a specified path (recursively).
Here, we provide several download links of datasets frequently used in unconditional models: LSUN, CelebA, CelebA-HQ, FFHQ.
For translation models, now we offer two settings for datasets called paired image dataset and unpaired image dataset.
For paired image dataset, every image is formed by concatenating two corresponding images from two domains along the width dimension. You are supposed to make two folders "train" and "test" filled with images of this format for training and testing. Folder structure is presented below.
./data/dataset_name/
├── test
│ └── XXX.jpg
└── train
└── XXX.jpg
For unpaired image dataset, you are supposed to make two folders "trainA" and "testA" filled with images from domain A and two folders "trainB" and "testB" filled with images from domain B. Folder structure is presented below.
./data/dataset_name/
├── testA
│ └── XXX.jpg
├── testB
│ └── XXX.jpg
├── trainA
│ └── XXX.jpg
└── trainB
└── XXX.jpg
Please read the section Datasets for unconditional models
and also use the symlink (soft link) to build up the dataset.
Here, we provide download links of datasets used in Pix2Pix and CycleGAN.
Currently, we have tested all of the model on distributed training. Thus, we highly recommend to adopt distributed training with our scripts. The basic usage is as follows:
bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS_NUMBER} \
--work-dir ./work_dirs/experiments/experiments_name \
[optional arguments]
If you are using slurm system, the following commands can help you start training"
bash tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG} ${WORK_DIR} \
[optional arguments]
There two scripts wrap tools/train.py with distributed training entrypoint. The optional arguments
are defined in tools/train.py. Users can also set random-seed
and resume-from
with these arguments.
Note that the name of work_dirs
has already been put into our .gitignore
file. Users can put any files here without concern about changing git related files. Here is an example command that we use to train our 1024x1024 StyleGAN2
model.
bash tools/slurm_train.sh openmmlab-platform stylegan2-1024 \
configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py \
work_dirs/experiments/stylegan2_c2_ffhq_1024_b4x8
During training, log files and checkpoints will be saved to the working directory. At the beginning of our development, we evaluate our model after the training finishes. However, the evaluation hook has been already supported to evaluate our models in the training procedure. More details can be found in our tutorial for running time configuration.
Currently, we have supported 6 evaluation metrics, i.e., MS-SSIM, SWD, IS, FID, Precision&Recall, and PPL. For unconditional GANs, we have provided unified evaluation scripts in tools/evaluation.py. Additionally, configs/base/default_metrics.py also offers the commonly used configurations to users. If users want to evaluate their models with some metrics, you can add the metrics
into your config file like this:
# at the end of the configs/styleganv2/stylegan2_c2_ffhq_256_b4x8_800k.py
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl',
bgr2rgb=True))
(We will specify how to obtain inception_pkl
in the FID section.)
Then, users can use the evaluation script with the following command:
bash eval.sh ${CONFIG_FILE} ${CKPT_FILE} --batch-size 10 --online
If you are in slurm environment, please switch to the tools/slurm_eval.sh by using the following commands:
bash slurm_eval.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE} \
--batch-size 10
--online
As you can see, we have provided two modes for evaluating your models, i.e., online
, and offline
. online
mode indicates that the synthesized images will be directly passed to the metrics instead of being saved to the file system. If users have set the --samples-path
argument, offline
mode will save the generated images in this directory so that users can use them for other tasks. Besides, users can use the offline
mode to sample images:
# for general envs
bash eval.sh ${CONFIG_FILE} ${CKPT_FILE} --eval none
# for slurm
bash slurm_eval.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE} \
--eval none
We also provide tools/utils/translation_eval.py for users to evaluate their translation models. You are supposed to set the target-domain
of the output images and run the following command:
python tools/utils/translation_eval.py ${CONFIG_FILE} ${CKPT_FILE} --t ${target-domain}
To be noted that, in current version of MMGeneration, we support multi GPUs for FID and IS evaluation and image saving. You can use the following command to use this feature:
# online evaluation
bash dist_eval.sh ${CONFIG_FILE} ${CKPT_FILE} ${GPUS_NUMBER} --batch-size 10 --online
# online evaluation with slurm
bash slurm_eval_multi_gpu.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE} --batch-size 10 --online
# offline evaluation
bash dist_eval.sh${CONFIG_FILE} ${CKPT_FILE} ${GPUS_NUMBER}
# offline evaluation with slurm
bash slurm_eval_multi_gpu.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE}
# image saving
bash dist_eval.sh${CONFIG_FILE} ${CKPT_FILE} ${GPUS_NUMBER} --eval none --samples-path ${SAMPLES_PATH}
# image saving with slurm
bash slurm_eval_multi_gpu.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE} --eval none --samples-path ${SAMPLES_PATH}
In the subsequent version, multi GPUs evaluation for more metrics will be supported.
Next, we will specify the details of different metrics one by one.
Fréchet Inception Distance is a measure of similarity between two datasets of images. It was shown to correlate well with the human judgment of visual quality and is most often used to evaluate the quality of samples of Generative Adversarial Networks. FID is calculated by computing the Fréchet distance between two Gaussians fitted to feature representations of the Inception network.
In MMGeneration
, we provide two versions for FID calculation. One is the commonly used PyTorch version and the other one is used in StyleGAN paper. Meanwhile, we have compared the difference between these two implementations in the StyleGAN2-FFHQ1024 model (the details can be found here). Fortunately, there is a marginal difference in the final results. Thus, we recommend users adopt the more convenient PyTorch version.
About PyTorch version and Tero's version: The commonly used PyTorch version adopts the modified InceptionV3 network to extract features for real and fake images. However, Tero's FID requires a script module for Tensorflow InceptionV3. Note that applying this script module needs PyTorch >= 1.6.0
.
About extracting real inception data: For convenience, we always extract the features for real images in advance. In MMGeneration
, we have provided tools/utils/inception_stat.py for users to prepare the real inception data. After running the following command, the extracted features will be saved in a pkl
file.
python tools/utils/inception_stat.py --imgsdir ${IMGS_PATH} --pklname ${PKLNAME} --size ${SIZE}
In the aforementioned command, the script will take the PyTorch InceptionV3 by default. If you want the Tero's InceptionV3, you will need to switch to the script module:
python tools/utils/inception_stat.py --imgsdir ${IMGS_PATH} --pklname ${PKLNAME} --size ${SIZE} \
--inception-style stylegan --inception-pth ${PATH_SCRIPT_MODULE}
If you want to know more information about how to extract the inception state please refer to this doc.
To use the FID metric, you should add the metric in a config file like this:
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl',
bgr2rgb=True))
If the inception_pkl
is not set, the metric will calculate the real inception statistics on the fly. If you hope to use the Tero's InceptionV3, please use the following metric configuration:
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-1024-50k-stylegan.pkl', inception_args=dict(
type='StyleGAN',
inception_path='work_dirs/cache/inception-2015-12-05.pt')))
The inception_path
indicates the path to Tero's script module.
Our Precision and Recall
implementation follows the version used in StyleGAN2. In this metric, a VGG network will be adopted to extract the features for images. Unfortunately, we have not found a PyTorch VGG implementation leading to similar results with Tero's version used in StyleGAN2. (About the differences, please see this file.) Thus, in our implementation, we adopt Teor's VGG network by default. Importantly, applying this script module needs PyTorch >= 1.6.0
. If with a lower PyTorch version, we will use the PyTorch official VGG network for feature extraction.
To evaluate with P&R
, please add the following configuration in the config file:
metrics = dict(
PR=dict(
type='PR',
num_images=50000))
Inception score is an objective metric for evaluating the quality of generated images, proposed in Improved Techniques for Training GANs. It uses an InceptionV3 model to predict the class of the generated images, and suppose that 1) If an image is of high quality, it will be categorized into a specific class. 2) If images are of high diversity, the range of images' classes will be wide. So the KL-divergence of the conditional probability and marginal probability can indicate the quality and diversity of generated images. You can see the complete implementation in metrics.py
, which refers to https://github.com/sbarratt/inception-score-pytorch/blob/master/inception_score.py.
If you want to evaluate models with IS
metrics, you can add the metrics
into your config file like this:
# at the end of the configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py
metrics = dict(
IS=dict(type='IS', num_images=106, image_shape=(3, 256, 256)))
You can run the command below to calculate IS.
python tools/utils/translation_eval.py --t photo \
./configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py \
https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth?versionId=CAEQMhiBgICb8fTj3RciIGU2NmViM2QyYzJkODQ0MDBhYTFhMGE2YzNmMTA0ODk3\
--eval IS
To be noted that, the selection of Inception V3 and image resize method can significantly influence the final IS score. Therefore, we strongly recommend users may download the Tero's script model of Inception V3 (load this script model need torch >= 1.6) and use Bicubic
interpolation with Pillow
backend. We provide a template for the data process pipline as well.
We also perform a survey on the influence of data loading pipeline and the version of pretrained Inception V3 on the IS result. All IS are evaluated on the same group of images which are randomly selected from the ImageNet dataset.
Show the Comparison Results
Code Base | Inception V3 Version | Data Loader Backend | Resize Interpolation Method | IS |
---|---|---|---|---|
OpenAI (baseline) | Tensorflow | Pillow | Pillow Bicubic | 312.255 +/- 4.970 |
StyleGAN-Ada | Tero's Script Model | Pillow | Pillow Bicubic | 311.895 +/ 4.844 |
mmgen (Ours) | Pytorch Pretrained | cv2 | cv2 Bilinear | 322.932 +/- 2.317 |
mmgen (Ours) | Pytorch Pretrained | cv2 | cv2 Bicubic | 324.604 +/- 5.157 |
mmgen (Ours) | Pytorch Pretrained | cv2 | Pillow Bicubic | 318.161 +/- 5.330 |
mmgen (Ours) | Pytorch Pretrained | Pillow | Pillow Bilinear | 313.126 +/- 5.449 |
mmgen (Ours) | Pytorch Pretrained | Pillow | cv2 Bilinear | 318.021+/-3.864 |
mmgen (Ours) | Pytorch Pretrained | Pillow | Pillow Bicubic | 317.997 +/- 5.350 |
mmgen (Ours) | Tero's Script Model | cv2 | cv2 Bilinear | 318.879 +/- 2.433 |
mmgen (Ours) | Tero's Script Model | cv2 | cv2 Bicubic | 316.125 +/- 5.718 |
mmgen (Ours) | Tero's Script Model | cv2 | Pillow Bicubic | 312.045 +/- 5.440 |
mmgen (Ours) | Tero's Script Model | Pillow | Pillow Bilinear | 308.645 +/- 5.374 |
mmgen (Ours) | Tero's Script Model | Pillow | Pillow Bicubic | 311.733 +/- 5.375 |
Perceptual path length measures the difference between consecutive images (their VGG16 embeddings) when interpolating between two random inputs. Drastic changes mean that multiple features have changed together and that they might be entangled. Thus, a smaller PPL score appears to indicate higher overall image quality by experiments.
As a basis for our metric, we use a perceptually-based pairwise image distance that is calculated as a weighted difference between two VGG16 embeddings, where the weights are fit so that the metric agrees with human perceptual similarity judgments.
If we subdivide a latent space interpolation path into linear segments, we can define the total perceptual length of this segmented path as the sum of perceptual differences over each segment, and a natural definition for the perceptual path length would be the limit of this sum under infinitely fine subdivision, but in practice we approximate it using a small subdivision $`\epsilon=10^{-4}`$
.
The average perceptual path length in latent space
Z, over all possible endpoints, is therefore
$$`L_Z = E[\frac{1}{\epsilon^2}d(G(slerp(z_1,z_2;t))), G(slerp(z_1,z_2;t+\epsilon)))]`$$
Computing the average perceptual path length in latent space
W is carried out in a similar fashion:
$$`L_Z = E[\frac{1}{\epsilon^2}d(G(slerp(z_1,z_2;t))), G(slerp(z_1,z_2;t+\epsilon)))]`$$
Where $`z_1, z_2 \sim P(z)`$
, and $` t \sim U(0,1)`$
if we set sampling
to full, $` t \in \{0,1\}`$
if we set sampling
to end. $` G`$
is the generator(i.e. $` g \circ f`$
for style-based networks), and $` d(.,.)`$
evaluates the perceptual distance between the resulting images.We compute the expectation by taking 100,000 samples (set num_images
to 50,000 in our code).
You can find the complete implementation in metrics.py
, which refers to https://github.com/rosinality/stylegan2-pytorch/blob/master/ppl.py.
If you want to evaluate models with PPL
metrics, you can add the metrics
into your config file like this:
# at the end of the configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py
metrics = dict(
ppl_wend=dict(type='PPL', space='W', sampling='end', num_images=50000, image_shape=(3, 1024, 1024)))
You can run the command below to calculate PPL.
python tools/evaluation.py ./configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py \
https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_ffhq_1024_b4x8_20210407_150045-618c9024.pth \
--batch-size 2 --online --eval ppl_wend
Sliced Wasserstein distance is a discrepancy measure for probability distributions, and smaller distance indicates generated images look like the real ones. We obtain the Laplacian pyramids of every image and extract patches from the Laplacian pyramids as descriptors, then SWD can be calculated by taking the sliced Wasserstein distance of the real and fake descriptors.
You can see the complete implementation in metrics.py
, which refers to https://github.com/tkarras/progressive_growing_of_gans/blob/master/metrics/sliced_wasserstein.py.
If you want to evaluate models with SWD
metrics, you can add the metrics
into your config file like this:
# at the end of the configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py
metrics = dict(swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)))
You can run the command below to calculate SWD.
python tools/evaluation.py ./configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py \
https://download.openmmlab.com/mmgen/pggan/pggan_celeba-cropped_128_g8_20210408_181931-85a2e72c.pth \
--batch-size 64 --online --eval swd16k
Multi-scale structural similarity is used to measure the similarity of two images. We use MS-SSIM here to measure the diversity of generated images, and a low MS-SSIM score indicates the high diversity of generated images. You can see the complete implementation in metrics.py
, which refers to https://github.com/tkarras/progressive_growing_of_gans/blob/master/metrics/ms_ssim.py.
If you want to evaluate models with MS-SSIM
metrics, you can add the metrics
into your config file like this:
# at the end of the configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py
metrics = dict(ms_ssim10k=dict(type='MS_SSIM', num_images=10000))
You can run the command below to calculate MS-SSIM.
python tools/evaluation.py ./configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py \
https://download.openmmlab.com/mmgen/pggan/pggan_celeba-cropped_128_g8_20210408_181931-85a2e72c.pth \
--batch-size 64 --online --eval ms_ssim10k
In this section, we will discuss how to evaluate the generative models, especially for GANs, in the training. Note that MMGeneration
only supports distributed training and the evaluation metric adopted in the training procedure should also be run in a distributed style. Currently, only FID
has been implemented and tested in an efficient distributed version. Other metrics with efficient distributed version will be supported in the recent future. Thus, in the following part, we will specify how to evaluate your models with FID
metric in training.
In eval_hooks.py, GenerativeEvalHook
is provided to evaluate generative models during training. The most important argument for this hook is metrics
. In fact, users can directly copy the configs in the last section to define the evaluation metric. To evaluate the model with FID
metric, please add the following python codes in your config file:
# define the evaluation keywords, otherwise evaluation will not be
# added in training
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID',
num_images=50000,
inception_pkl='path_to_inception_pkl',
bgr2rgb=True),
sample_kwargs=dict(sample_model='ema'))
We also provide TranslationEvalHook
to evaluate translation models during training. You can use it in almost the same way as GenerativeEvalHook
. The only difference is that you need to specify the target_domain
of the evaluated images. To evaluate the model with FID
metric, please add the following python codes in your config file:
# define the evaluation keywords, otherwise evaluation will not be
# added in training
evaluation = dict(
type='TranslationEvalHook',
interval=10000,
target_domain='target_domain',
metrics=dict(
type='FID',
num_images=50000,
inception_pkl='path_to_inception_pkl',
bgr2rgb=True),
sample_kwargs=dict(sample_model='ema'))
For FID
evaluation, our distributed version only takes about 400 seconds (7 minutes). Thus, it will not influence the training time significantly. In addition, users should also offer the val
dataset, even if this metric will not use the files from this dataset:
data = dict(
samples_per_gpu=4,
train=dict(dataset=dict(imgs_root='./data/ffhq/ffhq_imgs/ffhq_256')),
val=dict(imgs_root='./data/ffhq/ffhq_imgs/ffhq_256'))
We highly recommend that users should pre-calculate the inception pickle file in advance, which will reduce the evaluation cost significantly.
We also provide TranslationEvalHook
for users to evaluate translation models during training. The only difference with GenerativeEvalHook
is that you need to specify the target domain of the evaluated model. For example, to evaluate the model with FID
metric, please add the following python codes in your config file:
evaluation = dict(
type='TranslationEvalHook',
target_domain=target_domain,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True)
])