This repository contains an implementation of YOLOv2 and is based on Vivek Maskara's blog, which in turn follow Yumi's blog. The backbone network is Joseph Redmon's Darknet, but in this repository it is of course implemented in tensorflow.
The following Python modules are required to run the code
Mathematical details on the loss function can be found in this notebook. One major difference is the implementation of the computation of the term in the confidence loss
which computes the IoU of a predicted bounding box tf.gather_nd()
and tf.scatter_nd()
calls. Here
When training with the training notebook, darknet is initialised with Joseph Redmon's weights for Darknet. These weights can be can be downloaded with
!wget https://pjreddie.com/media/files/yolov2.weights
The anchor boxes are generated with k-means clustering, as described in the YOLO9000 paper. This is done for both the PascalVOC and for the COCO datasets in this notebook. The results are stored in the json files anchor_boxes_coco.json
and anchor_boxes_pascalvoc.json
.
The following figure shows the anchor boxes for the COCO (left) and PascalVOC (right) datasets.
All code is contained in the src
subdirectory. The following notebooks implement the high-level algorithms:
- GenerateAnchorBoxes.ipynb: Generate anchor boxes for YOLOv2 model, both for the COCO and the PascalVOC dataset.
- ReadImages.ipynb: Read some images from the COCO and PascalVOX datasets and visualise them with ground-truth annotations
- TrainModel.ipynb: Train darknet with the YOLOv2 loss function and save the weights to a specified directory.
- EvaluateModel.ipynb: Generate some predictions with a trained model and visualise them together with the ground-truth bounding boxes.
The following python files implement the key classes and functions:
- anchor_boxes.py: Auxilliary functions for dealing with anchor boxes. Implements k-means clustering with the IoU metric.
- darknet.py: The Darknet model with weights initialised from a specified datafile
- data_generator.py: Class for generating training data in the tensorflow dataset format. The datagenerator classes operate on image readers.
- image_reader.py: Classes for reading annotated images from the COCO and PascalVOC datasets using the OpenCV module.
- loss_function.py: YOLOv2 loss function, see here for a detailed mathematical discussion.
- post_processing.py: Functions for post-processing predictions. Currently this implements standard non-max suppression (NMS) and soft-NMS.
The code can process images from the COCO and PascalVOC datasets. I used the 2017 train/val data from COCO and the VOC2012 data. The relevant classes COCOImageReader
and PascalVOCImageReader
are implemented in image_reader.py
Loss history from training on the entire trainval
PascalVOC 2012 dataset, validating on the val
subset.
The following image shows an example detecting. Predicted bounding boxes are shown in cyan, the ground truth is shown in yellow.
- Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). arxiv:1506.02640
- Redmon, J. and Farhadi, A., 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).arxiv:1612.08242
- Bodla, N., Singh, B., Chellappa, R., Davis, L. S. Soft-NMS -- Improving Object Detection With One Line of Code arxiv:1704.04503