From 582287ea89df54114c924503db854501c9e67ee4 Mon Sep 17 00:00:00 2001 From: David Goodwin Date: Mon, 22 Jul 2019 14:42:29 -0700 Subject: [PATCH] Update README and versions for 19.07 release --- Dockerfile | 8 +- README.rst | 215 ++++++++++++++++++++++++++++++++++++++++++++++++++++- VERSION | 2 +- 3 files changed, 217 insertions(+), 8 deletions(-) diff --git a/Dockerfile b/Dockerfile index 656402027f..f8e6f112fb 100644 --- a/Dockerfile +++ b/Dockerfile @@ -163,8 +163,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac ############################################################################ FROM ${BASE_IMAGE} AS trtserver_build -ARG TRTIS_VERSION=1.4.0dev -ARG TRTIS_CONTAINER_VERSION=19.07dev +ARG TRTIS_VERSION=1.4.0 +ARG TRTIS_CONTAINER_VERSION=19.07 # libgoogle-glog0v5 is needed by caffe2 libraries. RUN apt-get update && \ @@ -301,8 +301,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"] ############################################################################ FROM ${BASE_IMAGE} -ARG TRTIS_VERSION=1.4.0dev -ARG TRTIS_CONTAINER_VERSION=19.07dev +ARG TRTIS_VERSION=1.4.0 +ARG TRTIS_CONTAINER_VERSION=19.07 ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION} ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION} diff --git a/README.rst b/README.rst index 9701d2c04c..85739ab0a9 100644 --- a/README.rst +++ b/README.rst @@ -30,13 +30,222 @@ NVIDIA TensorRT Inference Server ================================ - **NOTE: You are currently on the r19.07 branch which tracks - stabilization towards the next release. This branch is not usable - during stabilization.** + **NOTICE: The r19.07 branch has converted to using CMake + to build the server, clients and other artifacts. Read the new + documentation carefully to understand the new** `build process + `_. .. overview-begin-marker-do-not-remove +The NVIDIA TensorRT Inference Server provides a cloud inferencing +solution optimized for NVIDIA GPUs. The server provides an inference +service via an HTTP or GRPC endpoint, allowing remote clients to +request inferencing for any model being managed by the server. + +What's New In 1.4.0 +------------------- + +* Added libtorch as a new backend. PyTorch models manually decorated + or automatically traced to produce TorchScript can now be run + directly by the inference server. + +* Build system converted from bazel to CMake. The new CMake-based + build system is more transparent, portable and modular. + +* To simplify the creation of custom backends, a Custom Backend SDK + and improved documentation is now available. + +* Improved AsyncRun API in C++ and Python client libraries. + +* perf_client can now use user-supplied input data (previously + perf_client could only use random or zero input data). + +* perf_client now reports latency at multiple confidence percentiles + (p50, p90, p95, p99) as well as a user-supplied percentile that is + also used to stabilize latency results. + +* Improvements to automatic model configuration creation + (-\\-strict-model-config=false). + +* C++ and Python client libraries now allow additional HTTP headers to + be specified when using the HTTP protocol. + +Features +-------- + +* `Multiple framework support + `_. The + server can manage any number and mix of models (limited by system + disk and memory resources). Supports TensorRT, TensorFlow GraphDef, + TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model + formats. Also supports TensorFlow-TensorRT integrated + models. Variable-size input and output tensors are allowed if + supported by the framework. See `Capabilities + `_ + for detailed support information for each framework. + +* `Concurrent model execution support + `_. Multiple + models (or multiple instances of the same model) can run + simultaneously on the same GPU. + +* Batching support. For models that support batching, the server can + accept requests for a batch of inputs and respond with the + corresponding batch of outputs. The inference server also supports + multiple `scheduling and batching + `_ + algorithms that combine individual inference requests together to + improve inference throughput. These scheduling and batching + decisions are transparent to the client requesting inference. + +* `Custom backend support + `_. The inference server + allows individual models to be implemented with custom backends + instead of by a deep-learning framework. With a custom backend a + model can implement any logic desired, while still benefiting from + the GPU support, concurrent execution, dynamic batching and other + features provided by the server. + +* `Ensemble support + `_. An + ensemble represents a pipeline of one or more models and the + connection of input and output tensors between those models. A + single inference request to an ensemble will trigger the execution + of the entire pipeline. + +* Multi-GPU support. The server can distribute inferencing across all + system GPUs. + +* The inference server `monitors the model repository + `_ + for any change and dynamically reloads the model(s) when necessary, + without requiring a server restart. Models and model versions can be + added and removed, and model configurations can be modified while + the server is running. + +* `Model repositories + `_ + may reside on a locally accessible file system (e.g. NFS) or in + Google Cloud Storage. + +* Readiness and liveness `health endpoints + `_ + suitable for any orchestration or deployment framework, such as + Kubernetes. + +* `Metrics + `_ + indicating GPU utilization, server throughput, and server latency. + .. overview-end-marker-do-not-remove +The current release of the TensorRT Inference Server is 1.4.0 and +corresponds to the 19.07 release of the tensorrtserver container on +`NVIDIA GPU Cloud (NGC) `_. The branch for +this release is `r19.07 +`_. + +Backwards Compatibility +----------------------- + +Continuing in version 1.4.0 the following interfaces maintain +backwards compatibilty with the 1.0.0 release. If you have model +configuration files, custom backends, or clients that use the +inference server HTTP or GRPC APIs (either directly or through the +client libraries) from releases prior to 1.0.0 (19.03) you should edit +and rebuild those as necessary to match the version 1.0.0 APIs. + +These inferfaces will maintain backwards compatibility for all future +1.x.y releases (see below for exceptions): + +* Model configuration as defined in `model_config.proto + `_. + +* The inference server HTTP and GRPC APIs as defined in `api.proto + `_ + and `grpc_service.proto + `_. + +* The custom backend interface as defined in `custom.h + `_. + +As new features are introduced they may temporarily have beta status +where they are subject to change in non-backwards-compatible +ways. When they exit beta they will conform to the +backwards-compatibility guarantees described above. Currently the +following features are in beta: + +* In the model configuration defined in `model_config.proto + `_ + the sections related to model ensembling are currently in beta. In + particular, the ModelEnsembling message will potentially undergo + non-backwards-compatible changes. + + +Documentation +------------- + +The User Guide, Developer Guide, and API Reference `documentation +`_ +provide guidance on installing, building and running the latest +release of the TensorRT Inference Server. + +You can also view the documentation for the `master branch +`_ +and for `earlier releases +`_. + +READMEs for deployment examples can be found in subdirectories of +deploy/, for example, `deploy/single_server/README.rst +`_. + +The `Release Notes +`_ +and `Support Matrix +`_ +indicate the required versions of the NVIDIA Driver and CUDA, and also +describe which GPUs are supported by the inference server. + +Other Documentation +^^^^^^^^^^^^^^^^^^^ + +* `Maximizing Utilization for Data Center Inference with TensorRT + Inference Server + `_. + +* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference + `_. + +* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT + Inference Server and Kubeflow + `_. + +Contributing +------------ + +Contributions to TensorRT Inference Server are more than welcome. To +contribute make a pull request and follow the guidelines outlined in +the `Contributing `_ document. + +Reporting problems, asking questions +------------------------------------ + +We appreciate any feedback, questions or bug reporting regarding this +project. When help with code is needed, follow the process outlined in +the Stack Overflow (https://stackoverflow.com/help/mcve) +document. Ensure posted examples are: + +* minimal – use as little code as possible that still produces the + same problem + +* complete – provide all parts needed to reproduce the problem. Check + if you can strip external dependency and still show the problem. The + less time we spend on reproducing problems the more time we have to + fix it + +* verifiable – test the code you're about to provide to make sure it + reproduces the problem. Remove all other problems that are not + related to your request/question. + .. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg :target: https://opensource.org/licenses/BSD-3-Clause diff --git a/VERSION b/VERSION index f9c2aa2120..88c5fb891d 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.4.0dev +1.4.0