3D Human Motion Estimation via Motion Compression and Refinement
Zhengyi Luo, S. Alireza Golestaneh, Kris M. Kitani
ACCV 2020, Oral
[Project website][Quantitative Demo][10min Talk]
MEVA (Motion Estimation vis Variational Autoencoding) is a video-based 3D human pose estimation method that focus on producing stable and natural-looking human motion from videos. MEVA achieves state-of-the-art human pose estimation accuracy while reducing acceleration error siginificantly. Pleaser refer to our paper for more details.
- November 11, 2020 – 14:16 Inference code finished.
- Tested OS: Linux
- Python >= 3.6
Install the dependencies:
pip install -r requirements.txt
To run pre-trained models, please run the script:
bash scripts/prepare_data.sh
Command:
python scripts/run_meva_on_video.py --cfg train_meva_2 --vid_file zen_talking_phone.mp4 --output_folder results/output --exp train_meva_2
Training code coming soon!
Coming soon!
Here we compare MEVA with recent state-of-the-art methods on 3D pose estimation datasets. Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.
Models | 3DPW ↓ | MPI-INF-3DHP ↓ | H36M ↓ |
---|---|---|---|
SPIN | 59.2 | 67.5 | 41.1 |
Temporal HMR | 76.7 | 89.8 | 56.8 |
VIBE | 56.5 | 63.4 | 41.5 |
MEVA | 51.9 | 62.6 | 48.1 |
(The numbers here reflect the current state of this repo, so it might be different from what's in the paper. I did a couple of small improvment to code so it achieved better performance)
Eval code coming soon!
- Visulization scale seems off somehow (the humanoid is not scaled properly), still debugging!
If you find our work useful in your research, please cite our paper MEVA:
@article{Luo20203DHM,
title={3D Human Motion Estimation via Motion Compression and Refinement},
author={Zhengyi Luo and S. Golestaneh and Kris M. Kitani},
journal={ArXiv},
year={2020},
volume={abs/2008.03789}
}
Notice that this repo builds upon a number of previous great works (especially, VIBE), and borrow scripts from them for convenience. Since MEVA focuses on using a pre-trained VAE on AMASS to breakdown human pose estimation into its coarase-to-fine elements, so the visual training part is heavily borrowed from VIBE. For each file that is borrowed, we indicate that it is referenced and please adhere to their liscnece for usage.
- Dataloaders, part of the loss function, data pre-processing are from: VIBE
- SMPL models and layer is from: SMPL-X model
- Feature extractors are from: SPIN
- NN modules are from (khrylib): DLOW