Since we use SynchronizedBatchNorm, use as few GPUs as possible depending on how much GPU memory you have. Otherwise, you will see significantly slow training.
Download FlyingThings3D_subset (not the regular FlyingThings3D), Driving, and Monkaa data from https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html. Suppose the downloaded data are put under the SceneFlow
folder. We expect the following folder structure.
SceneFlow
|-- Driving
| |-- disparity
| |-- frames_cleanpass
| |-- optical_flow
|-- FlyingThings3D_subset
| |-- train
| | |-- disparity
| | |-- disparity_occlusions
| | |-- flow
| | |-- flow_occlusions
| | |-- image_clean
| |-- val
| | |-- disparity
| | |-- disparity_occlusions
| | |-- flow
| | |-- flow_occlusions
| | |-- image_clean
|-- Monkaa
| |-- disparity
| |-- frames_cleanpass
| |-- optical_flow
Configure the datasets' paths accordingly in sense/datasets/dataset_catlog.py
.
Run sh scripts/train_synthetic_sceneflow.sh
to start training the model from scratch. This model consists of estimations for optical flow, stereo disparity, and occlusions.
Download both optical flow (http://sintel.is.tue.mpg.de/downloads) and stereo disparity (http://sintel.is.tue.mpg.de/stereo) data first. We expect the following folder structure.
|-- MPI_Sintel
| |-- training
| |-- test
| |-- stereo
| | |-- training
After configuring the dataset's path in sense/datasets/dataset_catlog.py
, we are ready to fine-tune the model that has been trained on the synthetic SceneFlow dataset introduced in the previous section. Simply run sh scripts/train_mpi_sintel.sh
.
Fine-tuning on KITTI2012 and KITTI2015 is not that straightforward. It consists of couple of steps.
Again, download the data for KITTI2012 (http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=flow) and KITTI2015 (http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php) first. Particularly, for KITTI2015, we need the camera calibration data. We expect the following folder structure.
KITTI2012
|-- training
|-- testing
KITTI2015
|-- training
|-- testing
Semi-supervised training of SENSE requires pseudo ground-truth data for occlusion detection and semantic segmentation. For occlusion, we use the output of the model that is pre-trained on the synthetic SceneFlow dataset as pseudo ground-truth. For semantic segmentation, we use the model output from a ResNet101-UPerNet that is first trained on CityScapes and then fine-tuned on KITTI.
First, clone the UPerNet code from https://github.com/playerkk/semantic-segmentation-pytorch. After configuring the variables in scripts/prepare_pseudo_gt.sh
, simply run sh scripts/prepare_pseudo_gt.sh
. Note this step may take around 53GB space to store the results (mainly for semantic segmentation).
Configure PSEUDO_OCC_GT_DIR
and PSEDU_SEG_GT_DIR
in the scripts\train_kitti_holistic_model.sh
file then accordingly.
Run sh scripts\train_kitti_holistic_model.sh
to train a model for a set of holistic scene understanding tasks, including estimation of optical flow and stereo disparity, occlusion detection, and semantic segmentation.
We then need to train another neural network for wapred disparity refinement, which is essential for obtaining scene flow estimation results. Run sh scripts/train_kitti2015_warped_disp_refine.sh
.