ONNX Runtime Server provides an easy way to start an inferencing server for prediction with GRPC endpoints.
The CLI command to start the server is shown below:
$ ./onnxruntime_server --helpfull
onnxruntime_server: ./onnxruntime_server --model_path trained.onnx
Flags from onnxruntime_server.cpp:
--address (The base server address); default: "0.0.0.0";
--grpc_port (GRPC port to listen to requests); default: 50051;
--log_level (Logging level. Allowed options (case sensitive): info, warning,
error, fatal); default: INFO;
--model_path (Path to ONNX model); default: ;
--num_threads (Number of server threads); default: 0;
Note: The only mandatory argument for the program here is model_path
To host an ONNX model as an inferencing server, simply run:
./onnxruntime_server --model_path /<your>/<model>/<path>
The Abseil C++ library is cloned as a submodule. Run the following commands after cloning this repository:
git submodule init
git submodule update
Download the ONNX Runtime Release for your architecture.
You also need to build and install gRPC. Follow the gRPC Quick start
View and inspect ONNX model properties using Netron and note model input names and expected tensor sizes.
onnxruntime_server
implements a gRPC service that consumes input names in gRPC requests and produces output names
in gRPC responses that directly map to ONNX model properties.
Generate the Makefile:
% mkdir -p build && cd build
% cmake -DCMAKE_PREFIX_PATH=/<your>/<grpc>/<path> -DONNXRuntime_ROOT_DIR=/<your>/<onnxruntime>/<path>/onnxruntime-osx-x86_64-1.16.3 ..
Build the sources:
% make
...
[ 55%] Built target inference_grpc_proto
[ 66%] Building CXX object CMakeFiles/onnxruntime_serving.dir/serving.cc.o
[ 77%] Linking CXX static library libonnxruntime_serving.a
[ 77%] Built target onnxruntime_serving
[ 88%] Linking CXX executable onnxruntime_server
[100%] Built target onnxruntime_server
- Abseil - An open-source collection of C++ code (compliant to C++11) designed to augment the C++ standard library.
To use the GRPC endpoint, the protobuf can be found here. You could generate your client and make a GRPC call to it. To learn more about how to generate the client code and call to the server, please refer to the tutorials of GRPC.
You can change this to optimize server utilization. The default is the number of CPU cores on the host machine.
The following Visual Studio Code extensions are highly recommended for working with this project:
- C/C++ for Visual Studio Code - Provides rich C and C++ language support, including features such as IntelliSense, debugging, and code navigation.
- CMake For VisualStudio Code - Enables convenient configuration and building of CMake projects within VS Code.
- CMake Tools - Provides additional CMake support, including capabilities for configuring, building, and testing CMake projects.
This project is licensed under the MIT License - see the LICENSE.md file for details