Skip to content

Guide to deploying deep-learning inference networks and realtime object detection with TensorRT and Jetson TX1.

Notifications You must be signed in to change notification settings

wangjuenew/jetson-inference

 
 

Repository files navigation

Alt text

Deploying Deep Learning

Welcome to NVIDIA's deep learning inference workshop and end-to-end realtime object recognition library for Jetson TX1.

Included in this repo are resources for efficiently deploying neural networks using NVIDIA TensorRT.

Vision primitives, such as imageNet for image recognition and detectNet for object localization, inherit from the shared tensorNet object. Examples are provided for streaming from live camera feed and processing images from disk. The actions to understand and apply these are represented as ten easy-to-follow steps.

Ten Steps to Deep Learning

  1. What's Deep Learning?
  2. Getting TensorRT
  3. Building from Source
  4. Digging Into the Code
  5. Classifying Images with ImageNet
  6. Running the Live Camera Recognition Demo
  7. Re-training the Network with Customized Data
  8. Locating Object Coordinates using DetectNet
  9. Running the Live Camera Detection Demo
  10. Re-training DetectNet with DIGITS

note: this branch is verified against JetPack 2.3 / L4T R24.2 aarch64 (Ubuntu 16.04 LTS)

Other branches available: JetPack 2.2 / L4T R24.1 aarch64 (Ubuntu 14.04 LTS)

What's Deep Learning?

New to deep neural networks (DNNs) and machine learning? Take this introductory primer on training and inference.

Using NVIDIA deep learning tools, it's easy to Get Started training DNNs and deploying them with high performance.

NVIDIA DIGITS is used to interactively train network models on annotated datasets in the cloud or PC, while TensorRT and Jetson are used to deploy runtime inference in the field. Together, DIGITS and TensorRT form an effective workflow for developing and deploying deep neural networks capable of implementing advanced AI and perception.

To get started, see the DIGITS Getting Started guide and then the next section of the tutorial, Getting TensorRT.

Getting TensorRT

NVIDIA TensorRT is a new library available in JetPack 2.3 for optimizing and deploying production DNN's. TensorRT performs a host of graph optimizations and takes advantage of half-precision FP16 support on TX1 to achieve up to 2X or more performance improvement versus Caffe:

And in a benchmark conducted measuring images/sec/Watts, with TensorRT Jetson TX1 is up to 20X more power efficient than traditional CPUs at deep-learning inference. See this Parallel ForAll article for a technical overview of the release.

To obtain TensorRT, download the latest JetPack to your PC and re-flash your Jetson (see Jetson TX1 User Guide).

Building from Source

Provided along with this repo are TensorRT-enabled examples of running Googlenet/Alexnet on live camera feed for image recognition, and pedestrian detection networks with localization capabilities (i.e. that provide bounding boxes).

The latest source can be obtained from GitHub and compiled onboard Jetson TX1.

note: this branch is verified against JetPack 2.3 / L4T R24.2 aarch64 (Ubuntu 16.04 LTS)

1. Cloning the repo

To obtain the repository, navigate to a folder of your choosing on the Jetson. First, make sure git and cmake are installed locally:

sudo apt-get install git cmake

Then clone the jetson-inference repo:

git clone http://github.org/dusty-nv/jetson-inference

2. Configuring

When cmake is run, a special pre-installation script (CMakePreBuild.sh) is run and will automatically install any dependencies.

mkdir build
cd build
cmake ../

3. Compiling

Make sure you are still in the jetson-inference/build directory, created above in step #2.

cd jetson-inference/build			# omit if pwd is already /build from above
make

Depending on architecture, the package will be built to either armhf or aarch64, with the following directory structure:

|-build
   \aarch64		    (64-bit)
      \bin			where the sample binaries are built to
      \include		where the headers reside
      \lib			where the libraries are build to
   \armhf           (32-bit)
      \bin			where the sample binaries are built to
      \include		where the headers reside
      \lib			where the libraries are build to

binaries residing in aarch64/bin, headers in aarch64/include, and libraries in aarch64/lib.

Digging Into the Code

For reference, see the available vision primitives, including imageNet for image recognition and detectNet for object localization.

/**
 * Image recognition with GoogleNet/Alexnet or custom models, using TensorRT.
 */
class imageNet : public tensorNet
{
public:
	/**
	 * Network choice enumeration.
	 */
	enum NetworkType
	{
		ALEXNET,
		GOOGLENET
	};

	/**
	 * Load a new network instance
	 */
	static imageNet* Create( NetworkType networkType=GOOGLENET );
	
	/**
	 * Load a new network instance
	 * @param prototxt_path File path to the deployable network prototxt
	 * @param model_path File path to the caffemodel
	 * @param mean_binary File path to the mean value binary proto
	 * @param class_info File path to list of class name labels
	 * @param input Name of the input layer blob.
	 */
	static imageNet* Create( const char* prototxt_path, const char* model_path, const char* mean_binary,
							 const char* class_labels, const char* input="data", const char* output="prob" );

	/**
	 * Determine the maximum likelihood image class.
	 * @param rgba float4 input image in CUDA device memory.
	 * @param width width of the input image in pixels.
	 * @param height height of the input image in pixels.
	 * @param confidence optional pointer to float filled with confidence value.
	 * @returns Index of the maximum class, or -1 on error.
	 */
	int Classify( float* rgba, uint32_t width, uint32_t height, float* confidence=NULL );
};

Both inherit from the shared tensorNet object which contains common TensorRT code.

Classifying Images with ImageNet

There are multiple types of deep learning networks available, including recognition, detection/localization, and soon segmentation. The first deep learning capability to highlight is image recognition using an 'imageNet' that's been trained to identify similar objects.

The imageNet object accept an input image and outputs the probability for each class. Having been trained on ImageNet database of 1000 objects, the standard AlexNet and GoogleNet networks are downloaded during step 2 from above.

After building, first make sure your terminal is located in the aarch64/bin directory:

$ cd jetson-inference/build/aarch64/bin

Then, classify an example image with the imagenet-console program. imagenet-console accepts 2 command-line arguments: the path to the input image and path to the output image (with the class overlay printed).

$ ./imagenet-console orange_0.jpg output_0.jpg

$ ./imagenet-console granny_smith_1.jpg output_1.jpg

Next, we will use imageNet to classify a live video feed from the Jetson onboard camera.

Running the Live Camera Recognition Demo

Similar to the last example, the realtime image recognition demo is located in /aarch64/bin and is called imagenet-camera. It runs on live camera stream and depending on user arguments, loads googlenet or alexnet with TensorRT.

$ ./imagenet-camera googlenet           # to run using googlenet
$ ./imagenet-camera alexnet             # to run using alexnet

The frames per second (FPS), classified object name from the video, and confidence of the classified object are printed to the openGL window title bar. By default the application can recognize up to 1000 different types of objects, since Googlenet and Alexnet are trained on the ILSVRC12 ImageNet database which contains 1000 classes of objects. The mapping of names for the 1000 types of objects, you can find included in the repo under data/networks/ilsvrc12_synset_words.txt

Re-training the Network with Customized Data

The existing GoogleNet and AlexNet models that are downloaded by the repo are pre-trained on 1000 classes of objects.

What if you require a new object class to be added to the network, or otherwise require a different organization of the classes?
Using NVIDIA DIGITS, networks can be fine-tuned or re-trained from a pre-exisiting network model. After installing DIGITS on a PC or in the cloud (such as an AWS instance), see the Image Folder Specification to learn how to organize the data for your particular application. Popular training databases with various annotations and labels include ImageNet, MS COCO, and Google Images among others. See here for a crawler script that will download the 1000 original classes, including as many of the original images that are still available online.

Then, while creating the new network model in DIGITS, copy the GoogleNet prototxt and specify the existing GoogleNet caffemodel as the DIGITS Pretrained Model:

The network training should now converge faster than if it were trained from scratch. After the desired accuracy has been reached, copy the new model checkpoint back over to your Jetson and proceed as before, but now with the added classes available for recognition.

Locating Object Coordinates using DetectNet

The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability to highlight is detecting multiple objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.

The detectNet object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. Three example detection network models are are automatically downloaded during the repo source configuration:

  1. ped-100 (single-class pedestrian detector)
  2. multiped-500 (multi-class pedestrian + baggage detector)
  3. facenet-120 (single-class facial recognition detector)

To process test images with detectNet and TensorRT, use the detectnet-console program. detectnet-console accepts command-line arguments representing the path to the input image and path to the output image (with the bounding box overlays rendered). Some test images are included with the repo:

$ ./detectnet-console peds-007.png output-7.png

To change the network that detectnet-console uses, modify detectnet-console.cpp (beginning line 33):

detectNet* net = detectNet::Create( detectNet::PEDNET_MULTI );	 // uncomment to enable one of these 
//detectNet* net = detectNet::Create( detectNet::PEDNET );
//detectNet* net = detectNet::Create( detectNet::FACENET );

Then to recompile, navigate to the jetson-inference/build directory and run make.

Multi-class Object Detection

When using the multiped-500 model (PEDNET_MULTI), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.

$ ./detectnet-console peds-008.png output-8.png

Running the Live Camera Detection Demo

Similar to the previous example, detectnet-camera runs the object detection networks on live video feed from the Jetson onboard camera. Launch it from command line along with the type of desired network:

$ ./detectnet-camera multiped-500        # run using multi-class pedestrian/luggage detector
$ ./detectnet-camera ped-100             # run using original single-class pedestrian detector
$ ./detectnet-camera facenet-120         # run using facial recognition network
$ ./detectnet-camera                     # by default, program will run using multiped-500

note: to achieve maximum performance while running detectnet, increase the Jetson TX1 clock limits by running the script: sudo ~/jetson_clocks.sh

Re-training DetectNet with DIGITS

For a step-by-step guide to training custom DetectNets, see the Object Detection example included in DIGITS version 4:

The DIGITS guide above uses the KITTI dataset, however MS COCO also has bounding data available for a variety of objects.

Extra Resources

In this area, links and resources for deep learning developers are listed:

About

Guide to deploying deep-learning inference networks and realtime object detection with TensorRT and Jetson TX1.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 69.6%
  • Cuda 22.3%
  • C 4.5%
  • CMake 2.7%
  • Shell 0.9%