C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

# Create Optimized models for your machine.
$ python3 optimize_model_for_inference.py

# Build and Run Server
$ docker compose run --service-ports blaze

Development

Add Docker to CLion toolchain this will setup all necessary dependencies.

Client Instructions

curl "localhost:8088/classify" -F "image=@images/cat.jpg"

Benchmarking Instructions

# Drogon + libtorch
for i in {0..8}; do curl "localhost:8088/classify" -F "image=@images/cat.jpg"; done # Run once to warmup.
wrk -t8 -c100 -d60 -s benchmark/upload.lua "http://localhost:8088/classify" --latency

# FastAPI + pytorch
cd benchmark/python_fastapi
python3 -m venv env
source env/bin/activate
python3 -m pip install -r requirements.txt # Run just once to isntall dependencies to folder.
gunicorn main:app -w 2 -k uvicorn.workers.UvicornWorker --bind 127.0.0.1: # Best performance on my machine, tried 3/4 also.
deactivate # Use after benchmarking is done and gunicorn is closed

cd ../.. # back to root folder
for i in {0..8}; do curl "localhost:8088/classify" -F "image=@images/cat.jpg"; done
wrk -t8 -c100 -d60 -s benchmark/fastapi_upload.lua "http://localhost:8088/classify" --latency

Benchmarking results

Drogon + libtorch

# OS: Ubuntu 21.10 x86_64
# Kernel: 5.15.14-xanmod1
# CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
# GPU: NVIDIA GeForce RTX 3070
Running 1m test @ http://localhost:8088/classify
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.30ms   10.96ms  95.51ms   70.50%
    Req/Sec   306.58     28.78   390.00     70.92%
  Latency Distribution
     50%   37.40ms
     75%   45.69ms
     90%   54.57ms
     99%   69.34ms
  146612 requests in 1.00m, 30.34MB read
Requests/sec:   2441.60
Transfer/sec:    517.41KB

FastAPI + pytorch

# OS: Ubuntu 21.10 x86_64
# Kernel: 5.15.14-xanmod1
# CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
# GPU: NVIDIA GeForce RTX 3070
Running 1m test @ http://localhost:8088/classify
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   449.50ms  239.30ms   1.64s    70.39%
    Req/Sec    33.97     26.41   121.00     83.46%
  Latency Distribution
     50%  454.64ms
     75%  570.73ms
     90%  743.54ms
     99%    1.16s
  12981 requests in 1.00m, 2.64MB read
Requests/sec:    216.13
Transfer/sec:     44.96KB

Architecture

API request handing and model Pre-processing in the Drogon Controller controllers/ImageClass.cc
Batched Model Inference logic & post-processing in lib/ModelBatchInference.cpp

TODOS

Notes

WIP: Just gets the job done for now, not production ready, though tested regularly.

Name	Name	Last commit message	Last commit date
Latest commit viig99 Merge branch 'main' of github.com:SABER-labs/Drogon-torch-serve Sep 25, 2022 310d7c7 · Sep 25, 2022 History 78 Commits
.devcontainer	.devcontainer	Adding .devcontainer for development on VSCode.	Sep 25, 2022
benchmark	benchmark	Use ONNX for CPU.	Sep 19, 2022
cmake	cmake	Adding sanitizers to cmake.	Sep 25, 2022
controllers	controllers	Some improvements to code structure.	Sep 25, 2022
images	images	Few extra changes.	Jan 15, 2022
includes	includes	Use ONNX for CPU.	Sep 19, 2022
lib	lib	Some improvements to code structure.	Sep 25, 2022
model_resources	model_resources	Use ONNX for CPU.	Sep 19, 2022
models	models	Use ONNX for CPU.	Sep 19, 2022
test	test	Use ONNX for CPU.	Sep 19, 2022
.dockerignore	.dockerignore	Use ONNX for CPU.	Sep 19, 2022
.gitignore	.gitignore	Adding .devcontainer for development on VSCode.	Sep 25, 2022
CMakeLists.txt	CMakeLists.txt	Sanitizers affect performance, switch on only in debug mode.	Sep 25, 2022
Dockerfile	Dockerfile	Using mimalloc memory allocator.	Sep 25, 2022
Dockerfile.cpp-cpu-ubuntu	Dockerfile.cpp-cpu-ubuntu	Changes to support GPU & CPU.	Sep 25, 2022
Dockerfile.cpp-gpu-ubuntu	Dockerfile.cpp-gpu-ubuntu	Changes to support GPU & CPU.	Sep 25, 2022
LICENSE	LICENSE	Initial commit	Jan 15, 2022
README.md	README.md	Adding sanitizers to cmake.	Sep 25, 2022
config.json	config.json	Some improvements to code structure.	Sep 25, 2022
docker-compose.yml	docker-compose.yml	Using mimalloc memory allocator.	Sep 25, 2022
main.cc	main.cc	Some improvements to code structure.	Sep 25, 2022
optimize_model_for_inference.py	optimize_model_for_inference.py	Changes to support GPU & CPU.	Sep 25, 2022
requirements.txt	requirements.txt	Use ONNX for CPU.	Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

Development

Client Instructions

Benchmarking Instructions

Benchmarking results

Architecture

TODOS

Notes

About

Releases

Packages

Languages

License

SABER-labs/Drogon-torch-serve

Folders and files

Latest commit

History

Repository files navigation

C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

Development

Client Instructions

Benchmarking Instructions

Benchmarking results

Architecture

TODOS

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages