This is Wenlong's master thesis at VITA Lab EPFL, in this work, we are interested in perceiving humans — a fundamental and critical category for any autonomous vehicle operating alongside pedestrians (from social robots to self-driving cars). Note that our definition of human generalizes to pedestrians and any other category involving humans in the publicly available KITTI dataset, such as person sitting.
Python 3 is required and you need to clone this repository and then:
pip install numpy cython imgaug pycocotools
pip install torch==1.4.0 torchvision==0.5.0
python setup.py install
cd openpsf/correlation_package
python setup.py install
Please notice the correlation module requires GPU, so Mac could not run the model successfully.
We load pretrained pifpaf weights for the 2d pose detection. Please download pretrained weights frompifpaf.
Please keep the folder openpsf's name unchanged, since the pretrained pifpaf model will assign weight according to the folder name.
python3 -m openpsf.train --momentum=0.95 --epochs=20 --lr-decay 10 20 --batch-size=3 --basenet=resnet152block5 --quad=1 --headnets pif paf psf --square-edge=401 --regression-loss=laplace --lambdas 30 2 2 50 3 3 50 3 3 --crop-fraction=0.5 --pretrained (the model from pifpaf)
The pretrained model weights for person localization can be found from the google drive. The model with correlation module is psf_corr. If you want to use The model without correlation module, please replace the all heads_corr in nets.py with head_psf, the pretrained model is psf_no_corr.
We give an example in our example to illustrate the usage of our model with kitti dataset. there is no need to retrain if you want use your own dataset, you can fintune the hyperparameter in association_pair.py (ie the camera parameter ratio k and confidence threshold score) to better match your dataset.
python3 -m openpsf.predict --help
ALP | Type | error < 0.5 | error < 1 | error < 2 |
---|---|---|---|---|
PSF stereo | Stereo | 47.6% | 56.9% | 63.2% |
MonoLoco | Mono | 27.6% | 47.8% | 66.2% |
3DOP | Mono | 41.5% | 54.5% | 63.0% |
MonoDepth | Mono | 19.1% | 33.0% | 47.5% |
ALE | Type | Easy | Moderate | Hard |
---|---|---|---|---|
PSF stereo | Stereo | 0.50(0.59) | 0.59(0.72) | 0.73(0.65) |
3DOP | Mono | 0.54(0.72) | 0.85(1.13) | 1.56(1.65) |
MonoLoco | Mono | 0.85(0.88) | 0.97(1.23) | 1.14(1.49) |
MonoDepth | Mono | 1.40(1.69) | 2.19(2.98) | 2.31(3.77) |
The paper appears at ICRA 2020. If you use, compare with, or refer to this work, please cite
@INPROCEEDINGS{9197069,
author={W. {Deng} and L. {Bertoni} and S. {Kreiss} and A. {Alahi}},
booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
title={Joint Human Pose Estimation and Stereo 3D Localization},
year={2020},
volume={},
number={},
pages={2324-2330},
doi={10.1109/ICRA40945.2020.9197069}}