As part of my Final Year Project, I have trained the Yolo object detector on the Crowdhuman dataset. The goal was to be able to achieve fast detections on people and faces in crowds.
I've realized about a year later that this repo now comes up as one of the top links on a google search for "Crowdhuman", which is nice to see. However, I keep getting emails asking for the weights I trained.
I no longer have access to the weights, they were trained and saved on the Imperial College EEE network drives. I am no longer a student at Imperial College (having graduated thanks to this project), and as such, no longer have access to the network.
TLDR; The weights are lost in the depths of the EEE network oceans.
I've left this repository up as a GUIDE to train the network. Its not too hard, and didnt take too long, about 24 hours on a 4GB GTX 1050Ti
Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.
For more information see the Darknet project website.
For questions or issues please use the Google Group.
From the CrowdHuman website:
CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.
The CrowdHuman dataset can be downloaded from the here.
The training set is divided into 3 files, and are between 2-3GB zipped.
CrowdHuman_train01.zip
CrowdHuman_train02.zip
CrowdHuman_train03.zip
A validation set is also provided in CrowdHuman_val.zip
Both the training and validation sets come with annotations.
annotation_train.odgt
annotation_val.odgt
The annotations come in the .odtg
format. Each line in the files is a JSON containing the annotations found in the referred image.
The test set is provided CrowdHuman_test.zip
. As far as I can tell, there are no labels for the test set.
Training on Darknet is never fun. There are about a million different tutorials on how to setup the files, what to do, what files to change. Many tutorials are out of date. I'm about to contribute to this mess.
Note: This tutorial is not meant to be general 'how to'. It is how I managed to train YOLO.
I trained the yolov3-tiny
with 2 classes
on:
- Ubuntu 16.04 with Standard Darknet Setup (GPU, OPENCV, CUDNN)
- GTX 1050 Ti (4GB RAM)
- Download the 3 training zip files
CrowdHuman_train0*.zip
from the CrowdHuman website. - Extract all the files into the folder provided
/darknet/crowdhuman_train
. You should have about 15000 files in the folder. - Download the validation zip file
CrowdHuman_val.zip
- Extract the validation set into
/darknet/crowdhuman_val
. I think theres about 4370 files in there. - Download the both
annotation_train.odgt
andannotation_val.odgt
and place them in the main/darknet/
folder.
CrowdHuman provides its annotations in the .odtg
JSON format. Darknet does not like this. Darknet expects its annotations as such:
-
Each image has a corresponding textfile containing the annotations.
- Example:
dog.jpg
would have annotations indog.txt
, in the same folder.
- Example:
-
Annotations in Darknet look like this:
<object-class> <x> <y> <width> <height>
- Each line in the textfile is of that format, and each line represents an object.
x, y
is the centre of the bounding box.width/height
is from the centre of the box.- All these values need to be scaled with respect to the size of the image.
x = xCentre / imgWidth
y = yCentre / imgHeight
width = widthBoundingBox / imgWidth
height = heightBoundingBox / imgHeight
- All the values should be between 0 and 1.
I have written some files which convert and create the textfiles containing the annotations:
crowdhuman_train_anno.py
crowdhuman_val_anno.py
Simply run those two files from the terminal python crowdhuman_*_anno.py
from this folder, and it will generate all the corresponding textfiles in the /darknet/crowhuman_train
and /darknet/crowhuman_val
directories.
Darknet also needs another textfile which contains the paths to all the training and validation images. I have written some scripts to do this:
generate_train_txt.py
generate_val_txt.py
This generates two files train.txt
and val.txt
in the main /darknet/
directory.
Just run the command:
./darknet detector train cfg/yolo_crowdhuman.data cfg/yolov3-tiny-crowdhuman.cfg darknet53.conv.74
Where darknet53.conv.74
is the initial weights which one can get from:
wget https://pjreddie.com/media/files/darknet53.conv.74
- No GUI to save GPU Memory
- Might run out of memory on your GPU, so a good hack is to just run the training without any GUI. I used the virtual terminals
tty1
. (Or pressing CTRL + ALT + F1) - I killed the GUI by running
sudo service lightdm stop
. This left me with just a terminal and I trained the network there.
- Might run out of memory on your GPU, so a good hack is to just run the training without any GUI. I used the virtual terminals
View Results:
./darknet detector test cfg/yolo_crowdhuman.data cfg/yolov3-tiny-crowdhuman.cfg backup/yolov3-tiny-crowdhuman_30000.weights Image