Skip to content

Latest commit



236 lines (181 loc) · 7.8 KB

File metadata and controls

236 lines (181 loc) · 7.8 KB

dourflow: Keras implementation of YOLO v2

dourflow is a keras/tensorflow implementation of the state-of-the-art object detection system You only look once.

Original paper and github: YOLO9000: Better, Faster, Stronger & Darknet.


Simple use

  1. Clone repository:
git clone
  1. Download pretrained model: 3.5 / 3.6 and place it in dourflow/.

  2. Predict on an image:

python3 bird.jpg
  1. Use on webcam (press 'q' to quit):
python3 cam


Running python3 --help:

usage: [-h] [-m MODEL] [-c CONF] [-t THRESHOLD] [-w WEIGHT_FILE]

dourflow: a keras YOLO V2 implementation.

positional arguments:
  action                what to do: 'train', 'validate', 'cam' or pass a
                        video, image file/dir.

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        path to input yolo v2 keras model
  -c CONF, --conf CONF  path to configuration file
  -t THRESHOLD, --threshold THRESHOLD
                        detection threshold
  -w WEIGHT_FILE, --weight_file WEIGHT_FILE
                        path to weight file
  --gif                 video output stored as gif also

action [positional]

Pass what to do with dourflow:

  1. A path to an image file/dir or video: Run inference on those file(s).
  2. 'cam': Run inference on webcam ('cams' to store the output).
  3. 'validate': Perform validation on a trained model.
  4. 'train': Perform training on your own dataset.

model [-m]

Pass the keras input model h5 file (could be to perform inference, validate against or for transfer learning).

Pretrained COCO/VOC keras models can be downloaded here. Alternatively, you can download the weights from here and generate the model file using YAD2K.

conf [-c]

Pass a config.json file that looks like this (minus the comments!):

    "model" : {
        "input_size":       416, #Net input w,h size in pixels
        "grid_size":        13, #Grid size
        "true_box_buffer":  10, #Maximum number of objects detected per image
        "iou_threshold":    0.5, #Intersection over union detection threshold
        "nms_threshold":    0.3 #Non max suppression threhsold
    "config_path" : {
        "labels":           "models/coco/labels_coco.txt", #Path to labels file
        "anchors":          "models/coco/anchors_coco.txt", #Path to anchors file
        "arch_plotname":    "" #Model output name (leave empty for none, see result_plots/yolo_arch.png for an example)
    "train": {
        "out_model_name":   "", #Trained model name (saved during checkpoints)
        "image_folder":     "", #Training data, image directory
        "annot_folder":     "", #Training data, annotations directory (use VOC format)
        "batch_size":       16, #Training batch size
        "learning_rate":    1e-4, #Training learning rate
        "num_epochs":       20, #Number of epochs to train for
        "object_scale":     5.0 , #Loss function constant parameter
        "no_object_scale":  1.0, #Loss function constant parameter
        "coord_scale":      1.0, #Loss function constant parameter
        "class_scale":      1.0, #Loss function constant parameter
        "verbose":          1 #Training verbosity
    "valid": {
        "image_folder":     "", #Validation data, image directory
        "annot_folder":     "", #Validation data, annotation directory
        "pred_folder":      "", #Validation data, predicted images directory (leave empty for no predicted image output)

threshold [-t]

Pass the confidence threshold used for detection (default is 30%).


Single Image/Video

Will generate a file in the same directory with an '_pred' name extension. Example:

python3 theoffice.png -m coco_model.h5 -c coco_config.json -t 0.35

Batch Images

Will create a directory named out/ in the current one and output all the images with the same name.


python3 images/ -m coco_model.h5 -c coco_config.json -t 0.35


Allows to evaluate the performance of a model by computing its mean Average Precision in the task of object detection (mAP WRITE UP COMING SOON).


python3 validate -m voc_model.h5 -c voc_config.json


Batch Processed: 100%|████████████████████████████████████████████| 4282/4282 [01:53<00:00, 37.84it/s]
AP( cat ): 0.908
AP( train ): 0.907
AP( dog ): 0.899
AP( bird ): 0.814
AP( aeroplane ): 0.810
AP( cow ): 0.810
AP( bus ): 0.806
AP( motorbike ): 0.792
AP( person ): 0.737
AP( sheep ): 0.719
AP( tvmonitor ): 0.718
AP( sofa ): 0.701
AP( bicycle ): 0.683
AP( diningtable ): 0.665
AP( car ): 0.641
AP( boat ): 0.617
AP( horse ): 0.575
AP( pottedplant ): 0.568
AP( chair ): 0.528
AP( bottle ): 0.487
mAP: 0.719


Split dataset

Script to generate training/testing splits.

python3 -p 0.75 --in_ann VOC2012/Annotations/ --in_img VOC2012/JPEGImages/ --output ~/Documents/DATA/VOC

Anchor Generation


python3 genp -c config.json

Will store the custom bounding box priors wherever the path indicates in the config file under config['config_path']['anchors'] with the prefix 'custom_' (so as to not overwrite accidentally).


Training will create directory logs/ which will store metrics and checkpoints for all the different training runs.

Model passed is used for transfer learning by randomizing the last layer of the network (with the appropiate size of the target classes).

Example: python3 train -m models/logo/coco_model.h5 -c confs/config_custom.json

Then, in another terminal tab you can run tensorboard --logdir=logs/run_X and open a browser page at http://localhost:6006/ to monitor the loss, mAP, recall:

To Do

  • Multiclass Non Max Suppression
  • Anchor generation for custom datasets
  • mAP write up
  • Add webcam support
  • Data Augmentation
  • TensorBoard metrics

Inspired from