Semantic_Segmentation

Using the KITTI Dataset to perform pixelwise classification of road images.

Object detection in images has been continously advancing with more efficient and accurate research papers being released every year. One of the most famous types of object detection is semantic segmentation, which involves alligning every pixel with a particuar class of object. To explain this further look at the example below: every pixel in the ouput is given different color depending on whether that pixel is a road, vehicle, traffic sign, tree, human, etc. This pixel wise classification of images into different classes is known as semantic segmentation.

Comparision with conventional object detection

Conventional Object Detection involves creating a box around the desired object as shown in the example below:

If the goal of the classification is to identify the drivable portion of the road, creating a box around it would not make any sense as it will not be able to incorporate the actual part of the image where the drivable road is present.
Sometimes normal object detection may get too crowded and incomfortable to be read a shown here:

Semantic Segmentation has the power to obtain more accurate dimensions of the identified object which makes it easier to persorm any further computer vision techniques on it, for example: SnapChat Filters.

How does it work

As compared to normal convolutions in deep learning models, semantic segmentation also makes use of:

1x1 Convolutions
Transposed Convolutions
Skipped Connections
Transfer Learning

1x1 Convolutions

These are basically normal convolutions with size = (1,1), and strides=(1,1). Their operation is not very different as compared to fully dense layers, and they may seem rather redundant, however they have many benefits:

They are very computationally cheap to use.
They may be used for any size of images, as compared to dense layers which require a certain fixed input size to work.
1x1 Convolutions are the simplest method for dimensionality reduction.

Transposed Convolutions

Transposed Convolutions must be considered as nothing more than the opposite of normal convolutions. They upscale the images to a larger size, so that the output images may be formed from upsampling dense layers into full sized images.

Skipped Connections

Skipped Connections are used to reuse any information lost in the downsampling process of the network. In the above image the output of Predict2 and DeConv1 are added to give the result of DeConv2, this method regains information lost from the convolutions to give more accurate results. The downsampling part is called the encoder and the upsampling part is called the decoder

Transfer Learning

Lastly in this project I have used Transfer Learning. This uses pretrained weights from the VGG model. I have initialized the weights in the encoder part to those of the already trained vgg model. This way I save time in training by only having to train the decoder weights of the model.

My outputs:

Due to lack of computing power I have not used the CityScapes dataset but have rather used the KITTI dataset. This only consists of one single class of the image that is the drivable partion of the road. Here are a few examples of my test outputs:

Conclusion

As you can see the model has a few shortcomings of its own, it does not work well in light conditions, some images havent been recognized well and there isnt very high accuracy in the image recognition. There are many improvements that need to be made to make the model even more accurate and requiring fewer computational units. To view more of my outputs all of my test images are present in the runs folder.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
runs		runs
README.md		README.md
SettingUpColab.ipynb		SettingUpColab.ipynb
helper.py		helper.py
main2.ipynb		main2.ipynb
project_tests.py		project_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic_Segmentation

Comparision with conventional object detection

How does it work

1x1 Convolutions

Transposed Convolutions

Skipped Connections

Transfer Learning

My outputs:

Conclusion

About

Releases

Packages

Languages

SiddharthSingi/Semantic_Segmentation

Folders and files

Latest commit

History

Repository files navigation

Semantic_Segmentation

Comparision with conventional object detection

How does it work

1x1 Convolutions

Transposed Convolutions

Skipped Connections

Transfer Learning

My outputs:

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages