30 Oct 00:03

dzier

d81e141

Release 1.7.0, corresponding to NGC container 19.10

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 1.7.0

A Client SDK container is now provided on NGC in addition to the inference server container. The client SDK container includes the client libraries and examples.
TensorRT optimization may now be enabled for any TensorFlow model by enabling the feature in the optimization section of the model configuration.
The ONNXRuntime backend now includes the TensorRT and Open Vino execution providers. These providers are enabled in the optimization section of the model configuration.
Automatic configuration generation (--strict-model-config=false) now works correctly for TensorRT models with variable-sized inputs and/or outputs.
Multiple model repositories may now be specified on the command line. Optional command-line options can be used to explicitly load specific models from each repository.
Ensemble models are now pruned dynamically so that only models needed to calculate the requested outputs are executed.
The example clients now include a simple Go example that uses the GRPC API.

Known Issues

In TensorRT 6.0.1, reformat-free I/O is not supported.
Some versions of Google Kubernetes Engine (GKE) contain a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version or a GKE 1.14.6 or later version to avoid this issue.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.6.0_ubuntu1604.clients.tar.gz and v1.6.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files. The client SDK is also available as a NGC Container.

Custom Backend SDK

Ubuntu 16.04 and Ubuntu 18.04 builds of the custom backend SDK are included in this release in the attached v1.6.0_ubuntu1604.custombackend.tar.gz and v1.6.0_ubuntu1804.custombackend.tar.gz files. See the documentation section 'Building a Custom Backend' for more information on using these files.

Assets 6

27 Sep 21:50

dzier

v1.6.0

546b5cb

Release 1.6.0, corresponding to NGC container 19.09

NVIDIA TensorRT Inference Server

What's New In 1.6.0

Added TensorRT 6 support, which includes support for TensorRT dynamic
shapes.
Shared memory support is added as an alpha feature in this release. This
support allows input and output tensors to be communicated via shared
memory instead of over the network. Currently only system (CPU) shared
memory is supported.
Amazon S3 is now supported as a remote file system for model repositories.
Use the s3:// prefix on model repository paths to reference S3 locations.
The inference server library API is available as a beta in this release.
The library API allows you to link against libtrtserver.so so that you can
include all the inference server functionality directly in your application.
GRPC endpoint performance improvement. The inference server’s GRPC endpoint
now uses significantly less memory while delivering higher performance.
The ensemble scheduler is now more flexible in allowing batching and
non-batching models to be composed together in an ensemble.
The ensemble scheduler will now keep tensors in GPU memory between models
when possible. Doing so significantly increases performance of some ensembles
by avoiding copies to and from system memory.
The performance client, perf_client, now supports models with variable-sized
input tensors.

Known Issues

The ONNX Runtime backend could not be updated to the 0.5.0 release due to multiple performance and correctness issues with that release.
In TensorRT 6:
- Reformat-free I/O is not supported.
- Only models that have a single optimization profile are currently supported.
Google Kubernetes Engine (GKE) version 1.14 contains a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version to avoid this issue.

Client Libraries and Examples

Custom Backend SDK

Assets 6

03 Sep 23:53

dzier

v1.5.0

505a569

Release 1.5.0, corresponding to NGC container 19.08

NVIDIA TensorRT Inference Server

What's New In 1.5.0

Added a new execution mode allows the inference server to start without
loading any models from the model repository. Model loading and unloading
is then controlled by a new GRPC/HTTP model control API.
Added a new instance-group mode allows TensorFlow models that explicitly
distribute inferencing across multiple GPUs to run in that manner in the
inference server.
Improved input/output tensor reshape to allow variable-sized dimensions in
tensors being reshaped.
Added a C++ wrapper around the custom backend C API to simplify the creation
of custom backends. This wrapper is included in the custom backend SDK.
Improved the accuracy of the compute statistic reported for inference
requests. Previously the compute statistic included some additional time
beyond the actual compute time.
The performance client, perf_client, now reports more information for ensemble
models, including statistics for all contained models and the entire ensemble.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.5.0_ubuntu1604.clients.tar.gz and v1.5.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files.

Custom Backend SDK

Ubuntu 16.04 and Ubuntu 18.04 builds of the custom backend SDK are included in this release in the attached v1.5.0_ubuntu1604.custombackend.tar.gz and v1.5.0_ubuntu1804.custombackend.tar.gz files. See the documentation section 'Building a Custom Backend' for more information on using these files.

Assets 6

30 Jul 23:06

deadeyegoodwin

v1.4.0

582287e

Release 1.4.0, corresponding to NGC container 19.07

NVIDIA TensorRT Inference Server

What's New In 1.4.0

Added libtorch as a new backend. PyTorch models manually decorated or automatically traced to produce TorchScript can now be run directly by the inference server.
Build system converted from bazel to CMake. The new CMake-based build system is more transparent, portable and modular.
To simplify the creation of custom backends, a Custom Backend SDK and improved documentation is now available.
Improved AsyncRun API in C++ and Python client libraries.
perf_client can now use user-supplied input data (previously perf_client could only use random or zero input data).
perf_client now reports latency at multiple confidence percentiles (p50, p90, p95, p99) as well as a user-supplied percentile that is also used to stabilize latency results.
Improvements to automatic model configuration creation (--strict-model-config=false).
C++ and Python client libraries now allow additional HTTP headers to be specified when using the HTTP protocol.

Known Issues

Google Cloud Storage (GCS) support has been restored in this release.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.4.0_ubuntu1604.clients.tar.gz and v1.4.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files.

Custom Backend SDK

Ubuntu 16.04 and Ubuntu 18.04 builds of the custom backend SDK are included in this release in the attached v1.4.0_ubuntu1604.custombackend.tar.gz and v1.4.0_ubuntu1804.custombackend.tar.gz files. See the documentation section 'Building a Custom Backend' for more information on using these files.

Assets 6

28 Jun 16:36

deadeyegoodwin

v1.3.0

a11452c

Release 1.3.0, corresponding to NGC container 19.06

NVIDIA TensorRT Inference Server

What's New In 1.3.0

The ONNX Runtime (github.com/Microsoft/onnxruntime) is now integrated into inference server. ONNX models can now be used directly in a model repository.
HTTP health port may be specified independently of inference and status HTTP port with --http-health-port flag.
Fixed bug in perf_client that caused high CPU usage that could lower the measured inference/sec in some cases.

Known Issues

Google Cloud Storage (GCS) support is not available in the 19.06 release. Support for GCS is available on the master branch and will be re-enabled in the 19.07 release.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.3.0_ubuntu1604.clients.tar.gz and v1.3.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files.

Assets 4

24 May 16:20

deadeyegoodwin

v1.2.0

cb48659

Release 1.2.0, corresponding to NGC container 19.05

NVIDIA TensorRT Inference Server

What's New In 1.2.0

Ensembling is now available. An ensemble represents a pipeline of one or more models and the connection of input and output tensors between those models. A single inference request to an ensemble will trigger the execution of the entire pipeline.
Added Helm chart that deploys a single TensorRT Inference Server into a Kubernetes cluster.
The client Makefile now supports building for both Ubuntu 16.04 and Ubuntu 18.04. The Python wheel produced from the build is now compatible with both Python2 and Python3.
The perf_client application now has a --percentile flag that can be used to report latencies instead of reporting average latency (which remains the default). For example, using --percentile=99 causes perf_client to report the 99th percentile latency.
The perf_client application now has a -z option to use zero-valued input tensors instead of random values.
Improved error reporting of incorrect input/output tensor names for TensorRT models.
Added --allow-gpu-metrics option to enable/disable reporting of GPU metrics.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.2.0_ubuntu1604.clients.tar.gz and v1.2.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files.

Assets 4

24 Apr 00:07

deadeyegoodwin

v1.1.0

db56b7b

Release 1.1.0, corresponding to NGC container 19.04

NVIDIA TensorRT Inference Server

What's New In 1.1.0

Client libraries and examples now build with a separate Makefile (a Dockerfile is also included for convenience).
Input or output tensors with variable-size dimensions (indicated by -1 in the model configuration) can now represent tensors where the variable dimension has value 0 (zero).
Zero-sized input and output tensors are now supported for batching models. This enables the inference server to support models that require inputs and outputs that have shape [ batch-size ].
TensorFlow custom operations (C++) can now be built into the inference server. An example and documentation are included in this release.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v1.1.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

Assets 3

18 Mar 20:11

deadeyegoodwin

v1.0.0

324486f

Release 1.0.0, corresponding to NGC container 19.03

NVIDIA TensorRT Inference Server

What's New In 1.0.0

1.0.0 is the first GA, non-beta, release of TensorRT Inference Server. See the README for information on backwards-compatibility guarantees for this and future releases.
Added support for stateful models and backends that require multiple inference requests be routed to the same model instance/batch slot. The new sequence batcher provides scheduling and batching capabilities for this class of models.
Added GRPC streaming protocol support for inference requests.
The HTTP front-end is now asynchronous to enable lower-latency and higher-throughput handling of inference requests.
Enhanced perf_client to support stateful models and backends.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v1.0.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

Assets 3

28 Feb 02:32

deadeyegoodwin

v0.11.0

cfde288

Release 0.11.0 beta, corresponding to NGC container 19.02

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 0.11.0 Beta

Variable-size input and output tensor support. Models that support variable-size input tensors and produce variable-size output tensors are now supported in the model configuration by using a dimension size of -1 for those dimensions that can take on any size.
String datatype support. For TensorFlow models and custom backends, input and output tensors can contain strings.
Improved support for non-GPU systems. The inference server will run correctly on systems that do not contain GPUs and that do not have nvidia-docker or CUDA installed.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v0.11.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

Assets 3

28 Jan 21:03

deadeyegoodwin

v0.10.0

f501734

Release 0.10.0 beta, corresponding to NGC container 19.01

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 0.10.0 Beta

Custom backend support. TRTIS allows individual models to be implemented with custom backends instead of by a deep-learning framework. With a custom backend a model can implement any logic desired, while still benefiting from the GPU support, concurrent execution, dynamic batching and other features provided by TRTIS.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA TensorRT Inference Server

What's New In 1.7.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

NVIDIA TensorRT Inference Server

What's New In 1.6.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

NVIDIA TensorRT Inference Server

What's New In 1.5.0

Client Libraries and Examples

Custom Backend SDK

NVIDIA TensorRT Inference Server

What's New In 1.4.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

NVIDIA TensorRT Inference Server

What's New In 1.3.0

Known Issues

Client Libraries and Examples

NVIDIA TensorRT Inference Server

What's New In 1.2.0

Client Libraries and Examples

NVIDIA TensorRT Inference Server

What's New In 1.1.0

Client Libraries and Examples

NVIDIA TensorRT Inference Server

What's New In 1.0.0

Client Libraries and Examples

NVIDIA TensorRT Inference Server

What's New In 0.11.0 Beta

Client Libraries and Examples

NVIDIA TensorRT Inference Server

What's New In 0.10.0 Beta

Releases: triton-inference-server/server

Release 1.7.0, corresponding to NGC container 19.10

NVIDIA TensorRT Inference Server

What's New In 1.7.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

Release 1.6.0, corresponding to NGC container 19.09

NVIDIA TensorRT Inference Server

What's New In 1.6.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

Release 1.5.0, corresponding to NGC container 19.08

NVIDIA TensorRT Inference Server

What's New In 1.5.0

Client Libraries and Examples

Custom Backend SDK

Release 1.4.0, corresponding to NGC container 19.07

NVIDIA TensorRT Inference Server

What's New In 1.4.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

Release 1.3.0, corresponding to NGC container 19.06

NVIDIA TensorRT Inference Server

What's New In 1.3.0

Known Issues

Client Libraries and Examples

Release 1.2.0, corresponding to NGC container 19.05

NVIDIA TensorRT Inference Server

What's New In 1.2.0

Client Libraries and Examples

Release 1.1.0, corresponding to NGC container 19.04

NVIDIA TensorRT Inference Server

What's New In 1.1.0

Client Libraries and Examples

Release 1.0.0, corresponding to NGC container 19.03

NVIDIA TensorRT Inference Server

What's New In 1.0.0

Client Libraries and Examples

Release 0.11.0 beta, corresponding to NGC container 19.02

NVIDIA TensorRT Inference Server

What's New In 0.11.0 Beta

Client Libraries and Examples

Release 0.10.0 beta, corresponding to NGC container 19.01

NVIDIA TensorRT Inference Server

What's New In 0.10.0 Beta