⬇️ MScThesis 🎓 TU Delft ⬇️

Contrastive Learning of Visual Representations from Unlabeled Videos 📹

Code for my Master Thesis conducted inside the Pattern Recognition & Bioinformatics research group from Delft University of Technology.

My supervisors:

Head of the Computer Vision Lab and Associate Professor - Dr. Jan van Gemert
PhD student - Osman Semih Kayhan

Data 💾

For pre-training and evaluation, two action recognition datasets needs to be downloaded: HMDB51 and UCF101.

HMDB51 🎥

Download the train/test splits from here
Convert from avi to jpg:

python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory

Generate n_frame files for each video:

python utils/n_frames_ucf101_hmdb51.py jpg_video_directory

Generate json annotation files for each split, with annotation_dir_path containing *.txt files:

python utils/hmdb51_json.py annotation_dir_path

UCF101 🎥

Download the train/test splits from here
Convert from avi to jpg:

python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory

Generate n_frame files for each video:

python utils/n_frames_ucf101_hmdb51.py jpg_video_directory

Generate json annotation files for each split, with annotation_dir_path containing *.txt files:

python utils/ucf101_json.py annotation_dir_path

❗ After all of this steps, the data folder should have the following structure:

data
│   
│   hmdb51_1.json
|   hmdb51_2.json
|   hmdb51_3.json
|   ucf101_01.json       
|   ucf101_02.json 			
|   ucf101_03.json 		
|
└───hmdb51_videos
|   └───jpg
│       └───brush_hair
|       | folders with jpg and n_frame file for each brush_hair video  
|       |
|       └─── ... 51 folders for each action class
|       |
|       └───wave
|       | folders with jpg and n_frame file for each wave video 
|       └───
|
└───ucf101_videos
|   └───jpg
│       └───ApplyEyeMakeup
|       | folders with jpg and n_frame file for each ApplyEyeMakeup video  
|       |
|       └─── ... 101 folders for each action class
|       |
|       └───YoYo
|       | folders with jpg and n_frame file for each YoYo video 
|       └───
└───

Installation 💻

The scripts can be run in Anaconda Windows/Linux environment.

You need to create an Anaconda 🐍 python 3.6 environment named nonexpert_video. Inside that environment some addition packages needs to be installed. Run the following commands inside Anaconda Prompt ⌨:

(base) conda create -n nonexpert_video python=3.6 anaconda
(base) conda activate nonexpert_video
(nonexpert_video) conda install -c pytorch pytorch
(nonexpert_video) conda install -c pytorch torchvision
(nonexpert_video) conda install -c anaconda cudatoolkit
(nonexpert_video) conda install -c conda-forge tqdm

❗ For GPU support, NVIDIA CUDA compatible graphic card is needed with proper drivers installed.

Config file 📑

{
	"data_folder": "data",
	"video_folder": "ucf101_videos",   ucf101_videos or hmdb51_videos
	"frame_folder": "jpg",
	"annotation_file": "ucf101_01.json",   ucf101_01.json or hmdb51_1.json for the 1st split
	"base_convnet": "resnet18",
	"simclr_out_dim": 256,
	"dataset_type": "ucf101",   ucf101 or hmdb51
 	"num_classes": 101,   101 for UCF101 or 51 for HMDB51
	"strength": 0.5,
	"temp": 0.5,
	"batch_size": 256,
	"frame_resize": 56,   56 or 224
  	"sampling_method": "rand32", 
	"temporal_transform_type": "shift",   shift, drop, shuffle, reverse
	"temporal_transform_step": 8,   shift step size
  	"same_per_clip": "True",   False for Frame-mode and True for Chunk-mode
	"model_checkpoint_epoch": 0,    if !=0, load from checkpoint file
	"model_checkpoint_file": ".ptm",   PyTorch saved checkpoint for checkpoint epoch
	"num_epochs": 100,
	"num_workers": 4,  DataLoader number of workers, set accordingly to number of GPUs
}

Usage ▶️

Pre-training

train_videos_3d.py for videoSimCLR pre-training
train_videos_3d_supervised.py for fully supervised pre-training

Linear Evaluation

Evaluation_ResNet18_3D_videos.ipynb for videoSimCLR
Evaluation_ResNet18_3D_videos_kinetics.ipynb for Kinetics pre-trained
Evaluation_ResNet18_3D_videos_supervised.ipynb for supervised pre-trained

Fine-tuning

fine_tune.py

For running the scripts from videoMOCO_scripts folder, make sure to either copy the data folder inside the videoMOCO_scripts folder or change the data folder path from the config.txt to point to the data folder from the root.

Acknowledgements 👋

Part of this code is inspired by Hara et al. 3D Resnets repo.
Part of scripts from videoMOCO_scripts adapted from He et al. Momentum Contrast repo

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
utils		utils
videoMOCO_scripts		videoMOCO_scripts
Evaluation_ResNet18_3D_videos.ipynb		Evaluation_ResNet18_3D_videos.ipynb
Evaluation_ResNet18_3D_videos_kinetics.ipynb		Evaluation_ResNet18_3D_videos_kinetics.ipynb
Evaluation_ResNet18_3D_videos_supervised.ipynb		Evaluation_ResNet18_3D_videos_supervised.ipynb
README.md		README.md
Retrieval.ipynb		Retrieval.ipynb
config.txt		config.txt
data_transform.py		data_transform.py
fine_tune.py		fine_tune.py
hmdb51.py		hmdb51.py
logger.py		logger.py
nt_xent.py		nt_xent.py
overview.jpg		overview.jpg
resnet3d.py		resnet3d.py
temporal_transform.py		temporal_transform.py
train_videos_3d.py		train_videos_3d.py
train_videos_3d_supervised.py		train_videos_3d_supervised.py
ucf101.py		ucf101.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⬇️ MScThesis 🎓 TU Delft ⬇️

Contrastive Learning of Visual Representations from Unlabeled Videos 📹

Data 💾

HMDB51 🎥

UCF101 🎥

Installation 💻

Config file 📑

Usage ▶️

Pre-training

Linear Evaluation

Fine-tuning

Acknowledgements 👋

About

Releases

Packages

Languages

simionAndrei/MScThesis

Folders and files

Latest commit

History

Repository files navigation

⬇️ MScThesis 🎓 TU Delft ⬇️

Contrastive Learning of Visual Representations from Unlabeled Videos 📹

Data 💾

HMDB51 🎥

UCF101 🎥

Installation 💻

Config file 📑

Usage ▶️

Pre-training

Linear Evaluation

Fine-tuning

Acknowledgements 👋

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages