This readme contains the instructions for downloading, extracting and cleaning the dataset used in the ACTOR paper. It also provides the steps to compute pose predictions, deep features and instance features required for the ACTOR agent. All pose predictions are pre-computed to speed-up RL training.
Main requirements are Python 3.6 and a CUDA-enabled GPU for computing OpenPose features using Tensorflow (see details below).
ACTOR assumes that the dataset and the associated cache of pose predictions are provided using a certain structure.
There are two important folders:
- The data folder that contains the raw images and Panoptic annotations (used for evaluating the 3d error).
- The cache folder that contains the OpenPose predictions, deep features and instance features used for matching.
Internally these two folders mirror each other. They assume a second level of folders that separate the scenes into train, test and validation splits.
Example:
data/
train/
scene1/
val/
scene2/
test/
scene3/
cache/
train/
scene1/
val/
scene2/
test/
scene3/
You'll need around 500 - 800 gb of free space on your target drive. You may split the data/
and cache/
folder into different drives.
- Create your data folder (e.g.
data/
) somewhere. Do not create any subfolders. - Create your cache folder (e.g.
cache/
) somewhere. Then createcache/test
,cache/train
,cache/val
. - Go into the
load_config.m
file and set thedataset_path
anddataset_cache
flags to the corresponding paths (they should point todata/
andcache/
, respectively).
- Download the scenes and extract annotations and images:
./downloadAll.sh <data-folder-path>
- Clean-up the scene to remove bad camera frames and ensure that each scene has a fixed number of persons:
python3 clean_up_panoptic.py --check-same-nbr-frames --check-green-frame --hd --same-people --split-scene --min-con 100 --delete --min-nbr-people 1 <data-folder-path>
- Split the data into the official train, validation and test split by running:
bash split_data.sh <data-folder-path>
.
To predict the 2d joints and deep features we use a Tensorflow port of OpenPose based off this example. Your computer will need to support CUDA 9.0 or you'll need to install the tensorflow
instead of tensorflow-gpu
Python package to compute all features on the CPU.
- Go to
openpose-tf/
. In the following steps all paths are relative to that folder. - Download weights to
models/
. - Install Python 3.6 and the requirements in
requirements.txt
(or use Pipenv to do this automatically). - For each split (train, val, test) compute the 2d pose predictions and deep features.
python3 cache_pano.py --panopticpath <data-folder-path>/train --cachepath <cache-folder-path>/train
python3 cache_pano.py --panopticpath <data-folder-path>/val --cachepath <cache-folder-path>/val
python3 cache_pano.py --panopticpath <data-folder-path>/test --cachepath <cache-folder-path>/test
- Merge and bilinearly resize the deep features for each cache split using Matlab script
resize_merge.m
:resize_merge('<cache-folder-path>/train')
resize_merge('<cache-folder-path>/val')
resize_merge('<cache-folder-path>/test')
Next we want to generate the instance features used for matching people in the scene by appearance. The model is first trained for 40k iterations on the training split, then fine-tuned 2k iterations for each individual scene.
The base of the instance features comes from a VGG-19 model.
- Download the VGG-19 weights (sha1: 7e1441c412647bebdf7ae9750c0c9aba131a1601).
- Either run the 40k base training on the train split or download the weights.
- Train weights from scratch using the Matlab script:
run_train_instance_detector('train')
- Download pre-trained 40k iteration weights (sha1: 6727771807b0984f2f3bbed2cf4e0a2af80b396f).
- Train weights from scratch using the Matlab script:
- Generate the fine-tuned weights for each split:
run_generate_finetuned_instance_cache('<path-40k-instance-weights>', 'train', '<path-to-vgg19-weights>', 2000)
run_generate_finetuned_instance_cache('<path-40k-instance-weights>', 'val', '<path-to-vgg19-weights>', 2000)
run_generate_finetuned_instance_cache('<path-40k-instance-weights>', 'test', '<path-to-vgg19-weights>', 2000)
The official Caffe implementation of OpenPose supports predicting joints for face, hands and feet.
When running ACTOR in test time these can be used in the 3d reconstruction instead of the 15 joint representation.
To compute these features for Panoptic follow the steps below from the openpose-caffe/
folder.
- Clone the official OpenPose repository into
openpose_caffe/openpose
. - To ensure same results checkout commit
1e4a7853572e491c5ec0afac4288346c9004065f
. - Build Caffe with Python support (see official documentation here and here).
- Install Python 3.5 and requirements in
requirements.txt
(or use Pipenv to do this automatically). - Generate the full-body cache by running:
python cache_panoptic.py <data-folder-path>/train <cache-folder-path>/train
python cache_panoptic.py <data-folder-path>/val <cache-folder-path>/val
python cache_panoptic.py <data-folder-path>/test <cache-folder-path>/test
Below follows in-depth information about the dataset and the scripts.
downloadAll.sh
downloads, extracts and, verifies all scenes.downloadScene.sh
downloads a scene with all videos and annoationsextractScene.sh
extract images from the videos, removes videos and extracts the annotations. The videos frames are subsampled by a provided frequency. The annotations are then pruned to match the frames. Finally any Coco19 annotations are converted to MPII for all scenes, if they exists.subsample.sh
removes all but every n:th file in a directory.vgaImgsExtractor.sh
/hdImgsExtractor.sh
extract image frames from the video then calls subsample on the resulting frames.verifyScene.sh
checks the content of the dataset.clean_up_panoptic.py
removes bad frames and frames missing annotationsdiscard_annotations.py
removes annotations to match the subsampled frames.coco2mpii.py
converts the coco19 annotations to the MPII 15 joint format.openpose-tf/resize_merge.m
scales down the feature blobs and merges them into one file per scene.openpose-caffe/cache_panoptic.py
generates full-body 2d joint predictions instead of 15 joint version.
- Panoptic dataset scripts adapted from https://github.com/CMU-Perceptual-Computing-Lab/panoptic-toolbox
- OpenPose TensorFlow implementation is from https://arvrjourney.com/human-pose-estimation-using-openpose-with-tensorflow-part-1-7dd4ca5c8027