Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, most common used algorithms platform for competition. It is easy to obtain good results for object detection and instance segmentation competition. This sources contrain the source code for belows ,and the purpose of this open source is to give some ideas for Novice.
DC: Traffic jam car detection 3rd
DF: Safety hat detection 3rd
DF: Video Segmentation Challenge 2nd
Kaggle: Airbus Ship Detection Challenge (Part solution) 23/884 Top 3%
Detectron supplies Fast RCNN, Faster RCNN, FPN, RetinaNet, MaskRCNN. We could learn a lot from this series of work. Besides, Detectron also provides different kinds of base network which could be found under the configs folder.
Detectron also provides some detection tricks. Reading the config.py and the yaml file under test_time_aug folder carefully, you would find how to open them.
Soft-nms
Box voting
TTA
Multi_scale Training
Multi_scale testing
Official Verison: More reliable, faster to train and test and at the same time more GPU memory needed. The caffe2 is difficult to configure, has less documents, and is difficult to improve. It may be difficult to do some changes.
Pytorch Verison: May have some bugs, slower to train and test compared with official verison, GPU memory optimization seems better. Easy to deploy and more documents, so it is easy to do some improvement.
PANet: Does not see obvious improvement.
Cascade-RCNN: Performs better than the faster rcnn + FPN on kitti val dataset (splited as subCNN)
In this section, I would intorduce the concrete methods adopted for competition.
The purpose of this competition is to detect the cars from the pictures which are captured from the traffic jam. The detailed information about this competition could be found on their website. Overall, it is an easy competition.
As this dataset contains the scenes in day and night, so do gamma transformation randomly for each picture.
H-flip
Multi-scale Training: five kinds of scalesFaster RCNN + FPN , Base Network: X-101-64x4d
H-flip
Multi_scale testing: Larger scales could be adopted during inference compared with the training procedure.Soft-nms: threshold 0.43.
Box-voting: threshold 0.8.Float representation of the (x,y,w,h) is better than int representation if you do not have more trick to do approximate.
Enlarging the rectangle to 1.1 times of the predicted boxes could boost the score because of hand labeling
Approximating the x,y which is too small(large) to zero(width or hight of the picture) could boost score because of hand labeling.
The purpose of this competition is to detect the safety hat from the pictures. The detailed information about this competition could be found on their website. Overall, it is also an easy competition.
The solution of safety hat detection is similar to car detection. The differences are the adopted scales during multi-scale training and multi-scale testing.
- Save the results of different model before nms then ensemble the results and do final soft-nms
This is an instance segmentation competition which is a bit complicate compared with purely detection challenge. The competition has appeared in Kaggle:CVPR 2018 WAD Video Segmentation Challenge before the DF: Video Segmentation Challenge (interesting phenomenon). You may find some solution in kaggle.
The solution is similar with above solution. The differences are the adopted scales during multi-scale training and multi scale testing. Here, I introduce some new ideas:
Crop the image from 3384 * 2710 to 2284 * 1210 Change the ROI align resolution for object detection from 7 * 7 to 14 * 14 Change the ROI align resolution for instance segmentation
- More training augmentation: It is easy to expand the detectron with H-flip,V-flip, change the brightness or other attributes of the images, but difficult to do random crop. You may do random crop before training.
- Change the softmax classification loss to focal loss to balance the classes.
- Change the Cross Entropy Loss(instance segmentation loss)to Dice loss or other new semantic segemtation loss.
- PANet may be better.
- Cascade-RCNN may be better. The Cascade-RCNN performs about 0.1 higher than Faster RCNN + FPN (metric: ap), so it may be a better solution for the other two object detection competition.
The Airbus ship detection challenge