Code for my Master Thesis conducted inside the Pattern Recognition & Bioinformatics research group from Delft University of Technology.
My supervisors:
- Head of the Computer Vision Lab and Associate Professor - Dr. Jan van Gemert
- PhD student - Osman Semih Kayhan
For pre-training and evaluation, two action recognition datasets needs to be downloaded: HMDB51 and UCF101.
- Download the train/test splits from here
- Convert from avi to jpg:
python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory
- Generate n_frame files for each video:
python utils/n_frames_ucf101_hmdb51.py jpg_video_directory
- Generate json annotation files for each split, with
annotation_dir_path
containing *.txt files:
python utils/hmdb51_json.py annotation_dir_path
- Download the train/test splits from here
- Convert from avi to jpg:
python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory
- Generate n_frame files for each video:
python utils/n_frames_ucf101_hmdb51.py jpg_video_directory
- Generate json annotation files for each split, with
annotation_dir_path
containing *.txt files:
python utils/ucf101_json.py annotation_dir_path
❗ After all of this steps, the data folder should have the following structure:
data │ │ hmdb51_1.json | hmdb51_2.json | hmdb51_3.json | ucf101_01.json | ucf101_02.json | ucf101_03.json | └───hmdb51_videos | └───jpg │ └───brush_hair | | folders with jpg and n_frame file for each brush_hair video | | | └─── ... 51 folders for each action class | | | └───wave | | folders with jpg and n_frame file for each wave video | └─── | └───ucf101_videos | └───jpg │ └───ApplyEyeMakeup | | folders with jpg and n_frame file for each ApplyEyeMakeup video | | | └─── ... 101 folders for each action class | | | └───YoYo | | folders with jpg and n_frame file for each YoYo video | └─── └───
The scripts can be run in Anaconda Windows/Linux environment.
You need to create an Anaconda 🐍 python 3.6
environment named nonexpert_video
.
Inside that environment some addition packages needs to be installed. Run the following commands inside Anaconda Prompt ⌨:
(base) conda create -n nonexpert_video python=3.6 anaconda
(base) conda activate nonexpert_video
(nonexpert_video) conda install -c pytorch pytorch
(nonexpert_video) conda install -c pytorch torchvision
(nonexpert_video) conda install -c anaconda cudatoolkit
(nonexpert_video) conda install -c conda-forge tqdm
❗ For GPU support, NVIDIA CUDA compatible graphic card is needed with proper drivers installed.
{ "data_folder": "data", "video_folder": "ucf101_videos", ucf101_videos or hmdb51_videos "frame_folder": "jpg", "annotation_file": "ucf101_01.json", ucf101_01.json or hmdb51_1.json for the 1st split "base_convnet": "resnet18", "simclr_out_dim": 256, "dataset_type": "ucf101", ucf101 or hmdb51 "num_classes": 101, 101 for UCF101 or 51 for HMDB51 "strength": 0.5, "temp": 0.5, "batch_size": 256, "frame_resize": 56, 56 or 224 "sampling_method": "rand32", "temporal_transform_type": "shift", shift, drop, shuffle, reverse "temporal_transform_step": 8, shift step size "same_per_clip": "True", False for Frame-mode and True for Chunk-mode "model_checkpoint_epoch": 0, if !=0, load from checkpoint file "model_checkpoint_file": ".ptm", PyTorch saved checkpoint for checkpoint epoch "num_epochs": 100, "num_workers": 4, DataLoader number of workers, set accordingly to number of GPUs }
train_videos_3d.py
for videoSimCLR pre-trainingtrain_videos_3d_supervised.py
for fully supervised pre-training
Evaluation_ResNet18_3D_videos.ipynb
for videoSimCLREvaluation_ResNet18_3D_videos_kinetics.ipynb
for Kinetics pre-trainedEvaluation_ResNet18_3D_videos_supervised.ipynb
for supervised pre-trained
fine_tune.py
For running the scripts from videoMOCO_scripts
folder, make sure to either copy the data
folder inside the videoMOCO_scripts
folder or change the data folder path from the config.txt to point to the data
folder from the root.
- Part of this code is inspired by Hara et al. 3D Resnets repo.
- Part of scripts from
videoMOCO_scripts
adapted from He et al. Momentum Contrast repo