Semantic segmentation with Convolutional Neural Networks

Autonomous cars require strong perception systems. One of the methods to have this strong perception is the semantic segmentation of the elements in the road, using convolutional neural networks. In this project, use is made of the ERFNet architecture for the convolutional neural network and the database BDD100K to produce a semantic segmentation in real time for the following labels:

Pedestrians
Road Curb
White lane
Double yellow lane
Yellow lane
Small vehicles
Medium vehicles
Big vehicles
Drivable lane
Alternative lane

The algorithms developed for the lanes and obstacles detection were implemented using Python 2.7. This implementation add a weight matrix to keep a balance in between the classes at the training process. Also, now it's possible to start the training since the last epoch who has been trained.

When the training process ends the following files are generated:

automatedlog.txt: This file keep the values corresponding to the learning rate and the precision values in each epoch
best.txt: have the number of the epoch with the best learning rate
Results in each epoch: For each epoch there are generated 2 text files who indicates the learning rate for each class in training and validation
Best model of the network: save the model of the network who produces the best results and their parameters

Here are some of the examples of the images for the training set, after setting the required labels:

Finally, the best results were the following:

Classes	IOU%
Pedestrians	76.60
Road Curb	44.31
White Lane	32.53
Double yellow lane	63.38
Yellow lane	60.61
Small vehicles	37.70
Medium vehicles	74.38
Big vehicles	93.49
Drivable lane	88.24
Alternative lane	76.27

The IOU value, is a relation in between True Positives, False Positives and Falses Negatives, described by the next equation:

where:

TP: True positives
FP: False positives
FN: False negatives

The real-time testing was implemented using the framework ROS, in the Kinetic version, and rosbag files. The semantic segmentation in real-time looks as the following: