Skip to content

Latest commit

 

History

History
194 lines (138 loc) · 11.8 KB

README.md

File metadata and controls

194 lines (138 loc) · 11.8 KB

Experiments of inference using onnx models

Models

We look at 30 ONNX models from the ONNX model zoo. Some models do not support bach size more than 1 when running MXNet or other backends. To run the models in MXNet with batch size > 1, we use equivalent MXNet models form other sources.

ID Model Name The original ONNX model supports batch size > 1? The source mode to run in MXNet Notes
0 ArcFace yes ONNX
1 BVLC_AlexNet no GluonCV
2 BVLC_CaffeNet no Caffe
3 BVLC_GoogleNet no Caffe
4 BVLC_RCNN_ILSVRC13 no Caffe
5 DenseNet-121 yes ONNX
6 DUC yes ONNX
7 Emotion-FerPlus no None The original ONNX model is converted from CNTK
8 Inception-v1 no MXNet Model Server
9 Inception-v2 no None The original ONNX model is converted from Caffe2
10 MNIST no ONNX LeNet. The original ONNX model is converted from CNTK, which does not run. We trained a MXNet LeNet and converted it into ONNX.
11 MobileNet-v2 yes ONNX
12 ResNet018-v1 yes ONNX
13 ResNet018-v2 yes ONNX
14 ResNet034-v1 yes ONNX
15 ResNet034-v2 yes ONNX
16 ResNet050-v1 yes ONNX
17 ResNet050-v2 yes ONNX
18 ResNet101-v1 yes ONNX
19 ResNet101-v2 yes ONNX
20 ResNet152-v1 yes ONNX
21 ResNet152-v2 yes ONNX
22 Shufflenet yes ONNX
23 Squeezenet-v1.1 yes ONNX
24 Tiny_YOLO-v2 yes ONNX
25 VGG16-BN yes ONNX
26 VGG16 yes ONNX
27 VGG19-BN yes ONNX
28 VGG19 yes ONNX
29 Zfnet512 no GluonCV2 The original ONNX model is converted from Caffe2

Install Requirements

  1. GPU
pyenv virtualenv miniconda3-4.3.30 dlperf
pyenv activate dlperf

pip install onnx onnxmltools
pip install future click numba
pip install onnxruntime-gpu
pip install mxnet-cu101mkl
pip install gluoncv
pip install tensorflow-gpu
pip install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
pip install https://download.pytorch.org/whl/cu100/torchvision-0.3.0-cp37-cp37m-linux_x86_64.whl
  1. CPU

Use pipenv to launch a shell

pyenv virtualenv miniconda3-4.3.30 dlperf
pyenv activate dlperf

Then install the packages using pip

pip install onnx gluoncv mxnet onnxmltools onnxruntime torchvision torch tensorflow onnx-tf future tvm numba click pycodestyle

Setup the Environment

Run

./setup_en.sh

which set up the following environment variables.

  • Disable Autotune
export MXNET_CUDNN_AUTOTUNE_DEFAULT=0
export TF_CUDNN_USE_AUTOTUNE=0
  • Disable TensorCores
export MXNET_CUDA_ALLOW_TENSOR_CORE=0
export TF_DISABLE_CUDNN_TENSOR_OP_MATH=0
  • Disable bulk mode in MXNet
export MXNET_EXEC_BULK_EXEC_INFERENCE=0
export MXNET_EXEC_BULK_EXEC_TRAIN=0

Run Models

  1. Run ONNX models with various backends
python main.py --debug --backend=mxnet
python main.py --debug --backend=onnxruntime
python main.py --debug --backend=caffe2

Some models only support batch size = 1, see Models.

  1. Run MXNet models from GluonCV
python mxnet/gluon_forward.py --num_warmup=1 --num_iterations=1 --model_name=alexnet --model_idx=1 --batch_size=1
  1. Run MXNet models from XXX-symbo.json and XXX-0000.params
python mxnet/local_forward.py --num_warmup=1 --num_iterations=1 --model_name=bvlc_caffenet --model_idx=1 --batch_size=1
  1. Run experiments with scripts

Run all MXNet models with

./scripts/run_allmodels_mxnet.sh

Profiling

  1. Run with cudnn logging
CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=cudnn.log python main.py --debug --backend=mxnet
  1. Run with cublas logging
CUBLAS_LOGINFO_DBG=1 CUBLAS_LOGDEST_DBG=cublas.logpython main.py --debug --backend=mxnet
  1. Run with full logging
CUBLAS_LOGINFO_DBG=1 CUBLAS_LOGDEST_DBG=cublas.log CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=cudnn.log python main.py --debug --backend=mxnet
  1. Profile using nvprof
nvprof --profile-from-start off --export-profile profiler_output.nvvp -f --print-summary python main.py --backend=mxnet  --num_warmup=1 --num_iterations=1 --model_idx=1
  1. Profile using Nsight

Output to out.qdstrm to be visualized in Nsight GUI,

nsys profile --trace=cuda,cudnn,cublas --output=out.qdstrm python main.py --backend=mxnet --num_warmup=1 --num_iterations=1 --model_idx=1

Create a sqlite database based on the data collected.

nsys profile --trace=cuda,cudnn,cublas --output=out.qdstrm --export=sqlite python main.py --backend=mxnet --num_warmup=1 --num_iterations=1 --model_idx=1

MXNET Optimization

TensorRT

  1. Download TensorRT 5.1.x.x for Ubuntu 18.04 and CUDA 10.1 tar package from https://developer.nvidia.com/nvidia-tensorrt-download.

    wget https://developer.download.nvidia.com/compute/machine-learning/tensorrt/secure/5.1/ga/tars/TensorRT-5.1.5.0.Ubuntu-14.04.5.x86_64-gnu.cuda-10.1.cudnn7.5.tar.gz?fJfT5Un1lcVLX6aTm89YH629UhBMhoyMnb8HdlyVBZ88L5hrC7wwuzkb6sO63qnAY7daItOQus4c3W26kXBA_lx85AUPzImocwEUruEBu03qDyHSUoVqCHBY5C46WL9tOfug-qGNSJ4b-9Jc2aE48YQkymPsgH3AU9twHL8ghhlzzw3aUqZhRh98aUi6kydjT_nMvjt8IImTL8Juhk3mmb_SHMW8mW8xlrs7RhfVKdTw70MRhMtRrQ
    
  2. Extract archive

     tar xzvf TensorRT-5.1.x.x.<os>.<arch>-gnu.cuda-x.x.cudnn7.x.tar.gz
    
  3. Install uff python package using pip

     cd TensorRT-5.1.x.x/python
     pip install tensorrt-5.1.x.x-cp3x-none-linux_x86_64.whl