こちらはNVIDIAの推論と Jetson TX1 用の組込みディープビジョン・ランタイムライブラリを使うためのガイドです.
NVIDIA TensorRT を使ってニューラルネットワークを効率的に現場にデプロイするためのリソースがこのレポジトリに含まれています。
ビジョン用のプリミティブ、例えば画像認識用の imageNet
や、物体検出用の detectNet
、そしてセグメンテーション用の segNet
は、共通の tensorNet
オブジェクトを継承しています。サンプルとしては、ライブカメラからのストリームしたりディスクからの画像を処理する例が提供されています。これらを理解して応用するための道筋を10個のステップにまとめました。
- ディープラーニングとは?
- JetPack 2.3 / TensorRT を入手
- ソースからビルド
- コード詳説
- Classify Images with ImageNet
- Run the Live Camera Recognition Demo
- Re-train the Network with Customized Data
- Locate Object Coordinates using DetectNet
- Run the Live Camera Detection Demo
- Re-train DetectNet with DIGITS
Recommended System Requirements
学習用 GPU: Maxwell 世代もしくは Pascal 世代の TITAN-X、Tesla M40、Tesla P40、もしくは AWS P2 インスタンス。 Ubuntu 14.04 x86_64 もしくは Ubuntu 16.04 x86_64 ( DIGITS AWS AMI イメージを参照).
エッジ側: Jetson TX1 開発キット、JetPack 2.3 かそれ以降 (Ubuntu 16.04 aarch64).
注意: この ブランチ は Jetson TX1 と以下のBSPの組み合わせで検証されています:
> JetPack 2.3 / L4T R24.2 aarch64 (Ubuntu 16.04 LTS)
> JetPack 2.3.1 / L4T R24.2.1 aarch64 (Ubuntu 16.04 LTS)
別のブランチもあるので注意: JetPack 2.2 / L4T R24.1 aarch64 (Ubuntu 14.04 LTS)
レポジトリに含まれる TensorRT のサンプルは組込みの Jetson TX1 モジュール用ですが、cuDNN と TensorRT がホスト側にインストールされている場合は TensorRT をホスト側のPCでコンパイルすることも可能です。
ディープ・ニューラルネットワーク(DNN)や機械学習といった言葉が初めてという方は、こちらの 入門テキスト をご覧ください。
NVIDIAのディープラーニング用のツールを使えば、簡単にDNNの学習を 始めたり 高性能にデプロイすることが可能です。
NVIDIA DIGITS はクラウドやPC上のラベル付けされたデータセットに対してネットワークモデルの学習をインタラクティブに行えるツールです。一方 TensorRT や Jetson は推論ランタイムを現場にデプロイするのに用います。DIGITs と TensorRT を一緒に使うことで、高度なAIや認識を実現しうるディープ・ニューラルネットワークを開発・デプロイするための非常に効率のいいワークフローを実現できます。
DIGITS スタートガイド を読んだあとに、このチュートリアルの次の章 を入手 を読んでください。
最新の DIGITS をNVIDIAのGPUを搭載したホストPC もしくはクラウドサービスにインストールしてください。こちら developer.nvidia.com/digits からビルド済みのDockerイメージ、もしくは Amazon Machine Image (AMI) を参照ください。
NVIDIA TensorRT は JetPack 2.3 から利用可能になった新しいライブラリで、プロダクトレベルのDNNの最適化とデプロイのためのものです。TensorRT は多くのグラフ最適化が施され、また Tegra X1 で利用可能になった 半精度 FP16 を利用して、既存 Caffe 二倍の性能を実現します。
ワットあたりの処理性能 (iages/sec/Watts) を測るベンチマークでは、TensorRT を実装した Jetson TX1 は、従来のCPUに比べて20倍も電力効率がいいことが示されました。技術概要についてはこちらの Parallel ForAll ブログ記事をご覧ください。
TensorRT を入手するには、最新の JetPack をお手持ちのPCにダウンロードし、Jetson をフラッシュし直してください。(手順は Jetson TX1 ユーザーガイドを参照).
このレポジトリで提供されるのは、TensorRT を使ったサンプルプログラムで、Googlenet/Alexnet をカメラからの生映像に対してかけて画像認識を行ったり、歩行者検出を行いバウンディングボックスを描くものがあります。
最新のソースコードは、GitHub から入手でき、Jetson TX1 上でコンパイルします。
注意: この ブランチ は以下の組み合わせで検証しています。
JetPack 2.3 / L4T R24.2 aarch64 (Ubuntu 16.04 LTS)
To obtain the repository, navigate to a folder of your choosing on the Jetson. First, make sure git and cmake are installed locally:
sudo apt-get install git cmake
Then clone the jetson-inference repo:
git clone http://github.org/dusty-nv/jetson-inference
When cmake is run, a special pre-installation script (CMakePreBuild.sh) is run and will automatically install any dependencies.
cd jetson-inference
mkdir build
cd build
cmake ../
Make sure you are still in the jetson-inference/build directory, created above in step #2.
cd jetson-inference/build # omit if pwd is already /build from above
make
Depending on architecture, the package will be built to either armhf or aarch64, with the following directory structure:
|-build
\aarch64 (64-bit)
\bin where the sample binaries are built to
\include where the headers reside
\lib where the libraries are build to
\armhf (32-bit)
\bin where the sample binaries are built to
\include where the headers reside
\lib where the libraries are build to
binaries residing in aarch64/bin, headers in aarch64/include, and libraries in aarch64/lib.
For reference, see the available vision primitives, including imageNet
for image recognition and detectNet
for object localization.
/**
* Image recognition with GoogleNet/Alexnet or custom models, using TensorRT.
*/
class imageNet : public tensorNet
{
public:
/**
* Network choice enumeration.
*/
enum NetworkType
{
ALEXNET,
GOOGLENET
};
/**
* Load a new network instance
*/
static imageNet* Create( NetworkType networkType=GOOGLENET );
/**
* Load a new network instance
* @param prototxt_path File path to the deployable network prototxt
* @param model_path File path to the caffemodel
* @param mean_binary File path to the mean value binary proto
* @param class_info File path to list of class name labels
* @param input Name of the input layer blob.
*/
static imageNet* Create( const char* prototxt_path, const char* model_path, const char* mean_binary,
const char* class_labels, const char* input="data", const char* output="prob" );
/**
* Determine the maximum likelihood image class.
* @param rgba float4 input image in CUDA device memory.
* @param width width of the input image in pixels.
* @param height height of the input image in pixels.
* @param confidence optional pointer to float filled with confidence value.
* @returns Index of the maximum class, or -1 on error.
*/
int Classify( float* rgba, uint32_t width, uint32_t height, float* confidence=NULL );
};
Both inherit from the shared tensorNet
object which contains common TensorRT code.
There are multiple types of deep learning networks available, including recognition, detection/localization, and soon segmentation. The first deep learning capability to highlight is image recognition using an 'imageNet' that's been trained to identify similar objects.
The imageNet
object accept an input image and outputs the probability for each class. Having been trained on ImageNet database of 1000 objects, the standard AlexNet and GoogleNet networks are downloaded during step 2 from above.
After building, first make sure your terminal is located in the aarch64/bin directory:
$ cd jetson-inference/build/aarch64/bin
Then, classify an example image with the imagenet-console
program. imagenet-console
accepts 2 command-line arguments: the path to the input image and path to the output image (with the class overlay printed).
$ ./imagenet-console orange_0.jpg output_0.jpg
$ ./imagenet-console granny_smith_1.jpg output_1.jpg
Next, we will use imageNet to classify a live video feed from the Jetson onboard camera.
Similar to the last example, the realtime image recognition demo is located in /aarch64/bin and is called imagenet-camera
.
It runs on live camera stream and depending on user arguments, loads googlenet or alexnet with TensorRT.
$ ./imagenet-camera googlenet # to run using googlenet
$ ./imagenet-camera alexnet # to run using alexnet
The frames per second (FPS), classified object name from the video, and confidence of the classified object are printed to the openGL window title bar. By default the application can recognize up to 1000 different types of objects, since Googlenet and Alexnet are trained on the ILSVRC12 ImageNet database which contains 1000 classes of objects. The mapping of names for the 1000 types of objects, you can find included in the repo under data/networks/ilsvrc12_synset_words.txt
note: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the
DEFAULT_CAMERA
define at the top ofimagenet-camera.cpp
to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.
The existing GoogleNet and AlexNet models that are downloaded by the repo are pre-trained on 1000 classes of objects.
What if you require a new object class to be added to the network, or otherwise require a different organization of the classes?
Using NVIDIA DIGITS, networks can be fine-tuned or re-trained from a pre-exisiting network model. After installing DIGITS on a PC or in the cloud (such as an AWS instance), see the Image Folder Specification to learn how to organize the data for your particular application.
Popular training databases with various annotations and labels include ImageNet, MS COCO, and Google Images among others.
See here under the Downloading the dataset
section to obtain a crawler script that will download the 1000 original classes, including as many of the original images that are still available online.
note: be considerate running the crawler script from a corporate network, they may flag the activity. It will probably take overnight on a decent connection to download the 1000 ILSVRC12 classes (100GB) from ImageNet (1.2TB)
Then, while creating the new network model in DIGITS, copy the GoogleNet prototxt and specify the existing GoogleNet caffemodel as the DIGITS Pretrained Model:
The network training should now converge faster than if it were trained from scratch. After the desired accuracy has been reached, copy the new model checkpoint back over to your Jetson and proceed as before, but now with the added classes available for recognition.
The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability to highlight is detecting multiple objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.
The detectNet
object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. Three example detection network models are are automatically downloaded during the repo source configuration:
- ped-100 (single-class pedestrian detector)
- multiped-500 (multi-class pedestrian + baggage detector)
- facenet-120 (single-class facial recognition detector)
To process test images with detectNet
and TensorRT, use the detectnet-console
program. detectnet-console
accepts command-line arguments representing the path to the input image and path to the output image (with the bounding box overlays rendered). Some test images are included with the repo:
$ ./detectnet-console peds-007.png output-7.png
To change the network that detectnet-console
uses, modify detectnet-console.cpp
(beginning line 33):
detectNet* net = detectNet::Create( detectNet::PEDNET_MULTI ); // uncomment to enable one of these
//detectNet* net = detectNet::Create( detectNet::PEDNET );
//detectNet* net = detectNet::Create( detectNet::FACENET );
Then to recompile, navigate to the jetson-inference/build
directory and run make
.
When using the multiped-500 model (PEDNET_MULTI
), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.
$ ./detectnet-console peds-008.png output-8.png
Similar to the previous example, detectnet-camera
runs the object detection networks on live video feed from the Jetson onboard camera. Launch it from command line along with the type of desired network:
$ ./detectnet-camera multiped # run using multi-class pedestrian/luggage detector
$ ./detectnet-camera ped-100 # run using original single-class pedestrian detector
$ ./detectnet-camera facenet # run using facial recognition network
$ ./detectnet-camera # by default, program will run using multiped
note: to achieve maximum performance while running detectnet, increase the Jetson TX1 clock limits by running the script:
sudo ~/jetson_clocks.sh
> **note**: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the `DEFAULT_CAMERA` define at the top of [`detectnet-camera.cpp`](detectnet-camera/detectnet-camera.cpp) to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.
For a step-by-step guide to training custom DetectNets, see the Object Detection example included in DIGITS version 4:
The DIGITS guide above uses the KITTI dataset, however MS COCO also has bounding data available for a variety of objects.
In this area, links and resources for deep learning developers are listed:
- Appendix
- NVIDIA Deep Learning Institute — Introductory QwikLabs
- Building nvcaffe
- Other Examples
- ros_deep_learning - TensorRT inference ROS nodes