Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Baseline

Not exactly proposed by any paper.

Method Overview

The segmentation baseline takes semantic segmentation networks and appends a lane existence head. It takes the most classic multi-class segmentation approach, its design originates from the SCNN paper (ResNets and VGG based Deeplab), while the SAD paper explores the use of ENet and ERFNet, later the RESA paper reduced the network width for efficient ResNet baselines, finally the BézierLaneNet paper (this framework) improved these baselines with modern training techniques and fair evaluations, further extended them to modern architectures such as Swin Transformer, RepVGG and MobileNets. Among them, the ERFNet baseline even achieves comparable performance against SOTA methods. However, they are very sensitive to hyper-parameters, see Wiki and the BézierLaneNet Appendix.B for more info. Specifically, the VGG16 backbone corresponds to DeepLab-LargeFOV in SCNN, the ResNet & other backbones correspond to DeepLabV2 (w.o. ASPP) with output channels reduced to 128 as in RESA. We sometimes call them by backbone names for consistency with common practices.

Results

Training time estimated with single 2080 Ti.

ImageNet pre-training, 3-times average/best.

+ Measured on a single GTX 1080Ti.

# No pre-training.

* Trained on a 1080 Ti cluster, with CUDA 9.0 PyTorch 1.3, training time is estimated as: single 2080 Ti, mixed precision.

** Trained on two 2080ti.

TuSimple (test)

backbone aug resolution training time precision accuracy (avg) accuracy FP FN
VGG16 level 0 360 x 640 1.5h mix 93.79% 93.94% 0.0998 0.1021 model | shell
ResNet18 level 0 360 x 640 0.7h mix 94.18% 94.25% 0.0881 0.0894 model | shell
ResNet34 level 0 360 x 640 1.1h mix 95.23% 95.31% 0.0640 0.0622 model | shell
ResNet34 level 1a 360 x 640 1.2h* full 92.14% 92.68% 0.1073 0.1221 model | shell
ResNet50 level 0 360 x 640 1.5h mix 95.07% 95.12% 0.0649 0.0653 model | shell
ResNet101 level 0 360 x 640 2.6h mix 95.15% 95.19% 0.0619 0.0620 model | shell
ERFNet level 0 360 x 640 0.8h mix 96.02% 96.04% 0.0591 0.0365 model | shell
ERFNet level 1a 360 x 640 0.9h* full 94.21% 94.37% 0.0846 0.0770 model | shell
ENet# level 0 360 x 640 1h+ mix 95.55% 95.61% 0.0655 0.0503 model | shell
MobileNetV2 level 0 360 x 640 0.5h mix 93.98% 94.07% 0.0792 0.0866 model | shell
MobileNetV3-Large level 0 360 x 640 0.5h mix 92.09% 92.18% 0.1149 0.1322 model | shell

CULane (test)

backbone aug resolution training time precision F1 (avg) F1 normal crowded night no line shadow arrow dazzle
light
curve crossroad
VGG16 level 0 288 x 800 9.3h mix 65.93 66.09 85.51 64.05 61.14 35.96 59.76 78.43 53.25 62.16 2224 model | shell
ResNet18 level 0 288 x 800 5.3h mix 65.19 65.30 85.45 62.63 61.04 33.88 51.72 78.15 53.05 59.70 1915 model | shell
ResNet34 level 0 288 x 800 7.3h mix 69.82 69.92 89.46 66.66 65.38 40.43 62.17 83.18 58.51 63.00 1713 model | shell
ResNet50 level 0 288 x 800 12.4h mix 68.31 68.48 88.15 65.73 63.74 37.96 62.59 81.68 59.47 64.01 2046 model | shell
ResNet101 level 0 288 x 800 20.0h mix 71.29 71.37 90.11 67.89 67.01 43.10 70.56 85.09 61.77 65.47 1883 model | shell
ERFNet level 0 288 x 800 6h mix 73.40 73.49 91.48 71.27 68.09 46.76 74.47 86.09 64.18 66.89 2102 model | shell
ENet# level 0 288 x 800 6.4h+ mix 69.39 69.90 89.26 68.15 62.99 42.43 68.59 83.10 58.49 63.23 2464 model | shell
MobileNetV2 level 0 288 x 800 3.0h mix 67.34 67.41 87.82 65.09 61.46 38.15 57.34 79.29 55.89 60.29 2114 model | shell
MobileNetV3-Large level 0 288 x 800 3.0h mix 68.27 68.42 88.20 66.33 63.08 40.41 56.15 79.81 59.15 61.96 2304 model | shell
RepVGG-A0 level 0 288 x 800 3.3h** mix 70.22 70.56 89.74 67.68 65.21 42.51 67.85 83.13 60.86 63.63 2011 model | shell
RepVGG-A1 level 0 288 x 800 4.1h** mix 70.73 70.85 89.92 68.60 65.43 41.99 66.64 84.78 61.38 64.85 2127 model | shell
RepVGG-B0 level 0 288 x 800 6.2h** mix 71.77 71.81 90.86 69.32 66.68 43.53 67.83 85.43 59.80 66.47 2189 model | shell
RepVGG-B1g2 level 0 288 x 800 10.0h** mix 72.08 72.20 90.85 69.31 67.94 43.81 68.45 85.85 60.64 67.69 2092 model | shell
RepVGG-B2 level 0 288 x 800 13.2h** mix 72.24 72.33 90.82 69.84 67.65 43.02 72.08 85.76 61.75 67.67 2000 model | shell
Swin-Tiny level 0 288 x 800 12.1h** mix 69.75 69.90 89.55 68.36 63.56 42.53 61.96 82.64 60.81 65.21 2813 model | shell

LLAMAS (val)

backbone aug resolution training time precision F1 (avg) F1 TP FP FN Precision Recall
VGG16 level 0 360 x 640 9.3h mix 95.05 95.11 70263 3460 3772 95.31 94.91 model | shell
ResNet34 level 0 360 x 640 7.0h mix 95.90 95.91 70841 2847 3194 96.14 95.69 model | shell
ERFNet level 0 360 x 640 10.9h+ mix 95.94 96.13 71136 2830 2899 96.17 96.08 model | shell

Their test performance can be found at the LLAMAS leaderboard.

Profiling

FPS is best trial-avg among 3 trials on a 2080 Ti. Post-processing is ignored.

backbone resolution FPS FLOPS(G) Params(M)
VGG16 360 x 640 56.36 214.50 20.37
ResNet18 360 x 640 148.59 85.24 12.04
ResNet34 360 x 640 79.97 159.60 22.15
ResNet50 360 x 640 50.58 177.62 24.57
ResNet101 360 x 640 27.41 314.36 43.56
ERFNet 360 x 640 85.87 26.32 2.67
ENet 360 x 640 56.63 4.26 0.95
MobileNetV2 360 x 640 126.54 4.49 2.06
MobileNetV3-Large 360 x 640 104.34 3.63 3.30
VGG16 288 x 800 55.31 214.50 20.15
ResNet18 288 x 800 136.28 85.22 11.82
ResNet34 288 x 800 72.42 159.60 21.93
ResNet50 288 x 800 49.41 177.60 24.35
ResNet101 288 x 800 27.19 314.34 43.34
ERFNet 288 x 800 88.76 26.26 2.68
ENet 288 x 800 57.99 4.12 0.96
MobileNetV2 288 x 800 129.24 4.41 2.00
MobileNetV3-Large 288 x 800 107.83 3.56 3.25
RepVGG-A0 288 x 800 162.61 207.81 9.06
RepVGG-A1 288 x 800 117.30 339.83 13.54
RepVGG-B0 288 x 800 103.68 390.83 15.09
RepVGG-B1g2 288 x 800 36.91 1166.76 42.20
RepVGG-B2 288 x 800 18.98 2310.13 81.23
Swin-Tiny 288 x 800 51.90 44.24 27.72

Citation (if you have to)

@inproceedings{pan2018spatial,
  title={Spatial as deep: Spatial cnn for traffic scene understanding},
  author={Pan, Xingang and Shi, Jianping and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
  booktitle={AAAI},
  year={2018}
}

@inproceedings{feng2022rethinking,
  title={Rethinking efficient lane detection via curve modeling},
  author={Feng, Zhengyang and Guo, Shaohua and Tan, Xin and Xu, Ke and Wang, Min and Ma, Lizhuang},
  booktitle={Computer Vision and Pattern Recognition},
  year={2022}
}