- initialize and activate a new conda environment by running
conda create -n vidstyleode python=3.10
conda activate vidstyleode
- Install the requirements by running
pip install -r requirements.txt
Please refer to RAVDESSand Fashion Dataset official websites for instructions on downloading the datasets used in the paper. You may also experiment with your own dataset. The datasets should be arranged with the following structure
Folder1
Video_1.mp4
Video_2.mp4
..
Folder2
Video_1.mp4
Video_2.mp4
..
It is recommended to extract the frames of the video for easier training. To extract the frames, please run the following command
python scripts/extract_video_frames.py \
--source_directory <path-to-video-directory> \
--target_directory <path-to-output-target-directory>
The output folder will have the following structure
Folder1_1
000.png
001.png
..
Folder1_2
000.png
001.png
..
- Our method relies on a pretrained StyleGAN2 generation. Please download your pretrained generator checkpoint and provide its path in the training configuration file.
- For Face video (RAVDESS), we relied on the rosinality pretrained checkpoint. A converted checkpoint can be accessed from the StyleCLIP official repository, which can be downloaded from here.
- For full-body videos (Fashion Dataset), we relied on the pretrained checkpoint provided by StyleGAN-Human.
- For memory efficiency and to reduce the computation during training, we precomput the StyleGAN W+ embedding vector.
- Frames Proprocessing: - It is important to center-align your video frames before applying inversion. This is because stylegan generators usually generate aligned frames. Image inversion piplelines typically center-align the images before applying the stylegan inversion. If your videos are not center-aligned, please replace your video frames with those aligned.
- We rely on the official checkpoint of the pSp Inversion for our experiments on face videos (RAVDESS), and on the official checkpoint from StyleGAN-Human for our experiments on full-body videos (Fashion Dataset).
- Please refer to their official repositories for instructions on extracting the StyleGAN2 W+ embeddings. An embedding vector is typically of the shape
1 x 18 x hidden_dims
- The embeddings should be saved as
.pt
files and arranged in a structure similar to the video frames.
Folder1_1
000.pt
001.pt
..
Folder1_2
000.pt
001.pt
..
To enable style editing, you need to provide a textual description for each training video. Please store these descriptions in a file named text_descriptions.txt
within the corresponding video frames folder. For example:
Folder1_1
000.pt
001.pt
..
text_descriptions.txt
- Prepare a
.txt
file containing the video folder names for the training and validation. - Our splits for RAVDESS and Fasion Dataset are provided under the data folder.
- Prepare a
.yaml
configuration file where you need to specify the video frames directory underimg_root
, the W+ inversion folder underinversion_root
, and the training and validationtxt
files undervideo_list
. - Our config files for the RAVDESS and Fashion Dataset are provided under the configs folder.
- To start the training, run the following command:
python main.py --name <tag-for-your-experiment> \
--base <path-to-config-file>
- To resume the training, run the following command
python main.py --name <tag-for-your-experiment> \
--base <path-to-config-file> \
--resume <path-to-log-directory> or <path-to-checkpoint>
By default, the training checkpoint and figures will be logged under logs
folder as well as into wandb. Therefore, please log in to wandb by running
wandb login
To generate image animation results by using the motion from a driving video, please run the following script
python scripts/image_animation.py
--model_dir <log-dir-to-pretrained-model> \
--n_samples <number-of-sample-to-generate> \
--output_dir <path-to-save-dir> \
--n_frames <num-of-frames-to-generate-per-video> \
--spv <num-of-dirving-videos-per-sample> \ # driving videos will be chosen randomly
--video_list <txt-file-of-possible-target-videios> \
--img_root <path-to-videos-root-dir> \
--inversion_root <path-to-frames-inversion-root-dir> \
Instructions will be added later.
Instructions will be added later.
Instructions will be added later.
If you find this paper useful in your research, please consider citing:
@misc{ali2023vidstyleodedisentangledvideoediting,
title={VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs},
author={Moayed Haji Ali and Andrew Bond and Tolga Birdal and Duygu Ceylan and Levent Karacan and Erkut Erdem and Aykut Erdem},
year={2023},
eprint={2304.06020},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2304.06020},
}