NNTI-SemSeg

In this project, we address the computer vision task of semantic segmentation using deep learning-based approaches. In the first task, we train ENet architecture on the Pascal VOC 2012 dataset and obtain mean IoU of 27.75 % on the validation set. For the second task and third tasks, we train R2Unet and PSPNet architectures on Cityscapes dataset respectively, and obtain mean IoU of 33.65 % and 75.15 % respectively, on the validation set.

Task 1

The ipynb file for Task 1 can be found under task-1.We did not obtain desired plots for evaluations metrics as well as proper visualizations for output segmentation maps in this .ipynb notebook. So, we provide evaluation metrics and output segmentation maps using a different dataloader (analogous to ones we used for Tasks 2 and 3) in our report. The code for ENet using this data loader can be found in the git repository under the folder task-2-3.

Task 2 and Task 3

Installation Requirements

conda env create -f environment.yml

Code structure

The code structure is based on pytorch-template

pytorch-template/
│
├── train.py - main script to start training
├── inference.py - inference using a trained model
├── trainer.py - the main trained
├── config.json - holds configuration for training
│
├── base/ - abstract base classes
│   ├── base_data_loader.py
│   ├── base_model.py
│   ├── base_dataset.py - All the data augmentations are implemented here
│   └── base_trainer.py
│
├── dataloader/ - loading the data for different segmentation datasets
│
├── models/ - contains semantic segmentation models
│
├── saved/
│   ├── runs/ - trained models are saved here
│   └── log/ - default logdir for tensorboard and logging output
│  
└── utils/ - small utility functions
    ├── losses.py - losses used in training the model
    ├── metrics.py - evaluation metrics used
    └── lr_scheduler - learning rate schedulers

Datasets

Pascal VOC: We download the original dataset, and extract it to obtain a folder which contains he image sets, the XML annotation for both object detection and segmentation, and JPEG images which is named as VOCtrainval_11-May-2012/VOCdevkit/VOC2012 Following this, we use Semantic Contours from Inverse Detectorsto augment the dataset. We navigate to /VOCtrainval_11-May-2012/VOCdevkit/VOC2012/ImageSets/Segmentation and add the image sets (train_aug, trainval_aug, val_aug and test_aug) which is downloaded from this link: Aug ImageSets. After this step, we add new annotatations VOCtrainval_11-May-2012/VOCdevkit/VOC2012 which we downloaded from SegmentationClassAug. We now use VOCtrainval_11-May-2012 as the training path.
CityScapes: From the official website cityscapes-dataset.com we download the images (images leftImg8bit_trainvaltest.zip)and annotations(Fine gtFine_trainvaltest.zip and Coarse gtCoarse.zip annotations) and use the same folder for extraction and then specify this path in config.json for training.

Training

For training, we download the dataset and place in the directory structure outlined above. Then we choose the desired architecture and training hyperparameters and the correct path to the dataset directory in the config.json file. Following is an example command we run for training:

python train.py --config ./config-files/config_pspnet_cityscapes.json

The TensorBoard log files will be saved in saved/runs and the .pth model chekpoints in saved/. We can monitor the training using TensorBoard by running:

tensorboard --logdir saved/runs/PSPNet/03-28_23-02

Inference

For inference, we need a trained PyTorch model, test images, and the config file used for training:

python inference.py --config config.json --model best_model.pth --images images_folder

Following parameters availble for inference:

--output       The folder where the results will be saved (default: outputs).
--extension    The extension of the images to segment (default: jpg).
--images       Folder containing the images to segment.
--model        Path to the trained model.
--mode         Mode for inference `multiscale` or `sliding`
--config       The config file used for training the model.

Results

Model	Backbone	Dataset	mIoU	Pretrained Weights	Tensorboard	Evaluation Metrics
ENet	-	Pascal VOC	27.75 %	Google Drive	Google Drive	Google Drive
R2UNet	-	Cityscapes	33.65 %	Google Drive	Google Drive	Google Drive
PSPNet	ResNet-50	Cityscapes	75.15 %	Google Drive	Google Drive	Google Drive

Contributors

This code has been written for the Neural Networks: Theory and Implementation (NNTI) course project at Saarland University for Winter Semester 2020-21. Following are the contributors:

Sohom Mukherjee (Student Number: 7010515)
Shayari Bhattacharjee (Student Number: 7009998)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
task-1		task-1
task-2-3		task-2-3
LICENSE		LICENSE
NNTI_Project_7010515_7009998_Paper.pdf		NNTI_Project_7010515_7009998_Paper.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NNTI-SemSeg

Task 1

Task 2 and Task 3

Installation Requirements

Code structure

Datasets

Training

Inference

Results

Contributors

About

Releases

Packages

Contributors 2

Languages

License

mukherjeesohom/NNTI-SemSeg

Folders and files

Latest commit

History

Repository files navigation

NNTI-SemSeg

Task 1

Task 2 and Task 3

Installation Requirements

Code structure

Datasets

Training

Inference

Results

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages