Name	Name	Last commit message	Last commit date
parent directory ..
lib	lib
models	models
sample_app	sample_app
scripts	scripts
tests	tests
.gitignore	.gitignore
CMakeLists.txt	CMakeLists.txt
README.md	README.md

Stereo DNN TensorRT inference library

The goal of this project is to enable inference for NVIDIA Stereo DNN TensorFlow models on Jetson as well as other platforms supported by NVIDIA TensorRT library. You can see the inference on KITTI dataset video demo here

This is a 2-step process:

Convert the TensorFlow model to TensorRT C++ API model. This step is performed only once for each model and can be done on any environment like user's desktop.
Use the TRT C++ API model in an application. Once the model is built, it can be used in any environment (e.g. Jetson) to perform inference.

Note: TensorFlow is not required for the inference step. The library needs only basic components like CUDA 9.0, cuDNN 7.0 and TensorRT 3.0 so it will run as-is on Jetson with JetPack 3.2

The library implements the following TensorRT plugins:

conv3d : implementation of TensorFlow-compatible 3D convolution
conv3d_transpose: implementation of TensorFlow-compatible 3D transposed convolution (aka deconvolution)
cost_volume : implementation of cost volume computation used in StereoDNN
softargmax : implementation of specific softargmax implementation used in StereoDNN
elu : implementation of ELU activation function
transform : implementation of tensor transformation required for certain operations
slice : implementation of tensor slicing required for certain operations
pad : implementation of tensor padding required for certain operations

Note that these plugins make certain assumptions that are valid in case of Stereo DNN. slice and pad plugins implement only a tiny piece of functionality required to run the inference.

Models

There are several Stereo DNN models included in this packages, the following table provides brief comparison. TF stands for TensorFlow and TRT - for our implementation based on TensorRT and cuDNN. All times are in milliseconds per image, averaged over 200 images.

Model	Input size	Titan Xp (TF)	Titan Xp (TRT)	Jetson TX2 (TRT)	D1 error (%)
NVSmall	1025x321	800	450	7800	9.8
NVTiny	513x161	75	40	360	11.12
ResNet-18	1025x321	950	650	11000	3.4(*)
ResNet-18 2D	513x257	15	9	110	9.8

Notes:

We could not run TensorFlow on Jetson with our models so no measurments were done in this case.
D1 error for NVSmall and NVTiny was measured on 200 training images from KITTI 2015 stereo benchmark. This dataset was not used to train the models.
* - measured on KITTI 2015 stereo test set. Note that this model was fine-tuned on 200 training images, so providing error on that dataset is not useful.

Converting TensorFlow model to TensorRT C++ API model

To convert TensorFlow model to TensorRT C++ API, run the ./scripts/model_builder.py script which takes several named parameters:

model type
network name to use in generated C++ code
path to TensorFlow model
path to resulting binary weights file
path to generated C++ file.

You can also optionally specify model data type (fp32 or fp16 with fp32 being default).

Example:

cd ./scripts/
python ./model_builder.py --model_type nvsmall --net_name NVSmall1025x321 --checkpoint_path=../models/NVSmall/TensorFlow/model-inference-1025x321-0 --weights_file=../models/NVSmall/TensorRT/trt_weights.bin --cpp_file=../sample_app/nvsmall_1025x321_net.cpp --data_type fp32

Currently the supported model types are nvsmall and resnet18. NVTiny is a slight variation of NVSmall so it works with nvsmall model type. Adding new model types should be relatively easy, ./scripts/model_nvsmall.py or ./scripts/model_resnet18.py can provide a good starting point.

Note: TensorFlow v.1.5 or later is required. We stronly recommend using our TensorFlow Docker container as it contains all necessary components required to use Stereo DNN with TensorFlow.

Building inference code

Once the TensorRT C++ model is created, it can be used in any TensorRT-enabled application. The inference static library nvstereo_inference located at ./lib/ contains imlpementation of TensorRT plugins requried to run Stereo DNN. A sample application located at ./sample_app/ provides example of library usage. To build library, sample application and tests, run the following commands:

# Build debug:
mkdir build
cd ./build/
cmake -DCMAKE_BUILD_TYPE=Debug ..
make
# Build release:
cd ..
mkdir build_rel
cd ./build_rel/
cmake -DCMAKE_BUILD_TYPE=Release ..

If you get CMake error that GTest is not found, do the following:

cd /usr/src/gtest
cmake CMakeLists.txt
make

and then try building the library again (you may need to use sudo depending on your environment).

The build will place binary files in ./bin/ directory.

It's a good idea to run the tests first to make sure everything is working as expected:

./bin/nvstereo_tests_debug ./tests/data

All tests should pass (obviously). We recommend running debug version first to make sure all asserts in the code are enabled.

To run the sample application:

./bin/nvstereo_sample_app_debug nvsmall 513 161 ./models/NVTiny/TensorRT/trt_weights.bin ./sample_app/data/img_left.png ./sample_app/data/img_right.png ./bin/disp.bin

The app takes 8 parameters:

model type (nvsmall or resnet18)
dimensions of the image (width and height - must be equal to dimensions of network input)
path to weights file created by model builder script
2 images, left and right (e.g. PNG files)
path to output file, the app will create 2 files: binary and PNG
[optional] data type (fp32 or fp16). Note that FP16 implementation in cuDNN is currently not optimized for 3D convolutions so results might be worse than FP32.

We recommend running debug version first to make sure all asserts in the code are enabled.

The following scripts demonstrate how to properly read and pre-process images for the Stereo DNN:

Using OpenCV (C++ version is in the sample_app as well):

import numpy as np
import cv2

# Using OpenCV
img = cv2.imread('left.png')
img = cv2.resize(img, (1025, 321), interpolation = cv2.INTER_AREA)
# Convert to RGB and then CHW.
img = np.transpose(img[:, :, ::-1], [2, 0, 1]).astype(np.float32)
img /= 255.0
print(img.shape)
with open('left.bin', 'wb') as w:
    img.reshape(-1).tofile(w)

Using TensorFlow:

import numpy as np
import tensorflow as tf

img = tf.image.decode_png(tf.read_file('left.png'), dtype=tf.uint8)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize_images(img, [321, 1025], tf.image.ResizeMethod.AREA)
# Convert to CHW.
img_res = np.transpose(img.eval(), [2, 0, 1])
with open('left.bin', 'wb') as w:
    img_res.reshape(-1).tofile(w)

Note that due to different implementation of resizing algorithm in TF and OpenCV results will not be byte-wise equal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stereoDNN

stereoDNN

README.md

Stereo DNN TensorRT inference library

Models

Converting TensorFlow model to TensorRT C++ API model

Building inference code

Files

stereoDNN

Directory actions

More options

Directory actions

More options

Latest commit

History

stereoDNN

Folders and files

parent directory

README.md

Stereo DNN TensorRT inference library

Models

Converting TensorFlow model to TensorRT C++ API model

Building inference code