[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton #7718

zmy1116 · 2024-10-18T15:51:33Z

Description
I created the same issue at Torch-TensorRT because I don't know which side (triton server or torch-tensorrt) is causing the problem
pytorch/TensorRT#3248

I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial
https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_triton.html

When I try to perform inference using Triton server, I got the error
ERROR: [Torch-TensorRT] - IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (invalid resource handle)

Triton Information
What version of Triton are you using?
I'm using the triton container and pytorch container on NGC:
Pytorch for model optimiztion: nvcr.io/nvidia/pytorch:24.09-py3
Triton for hosting: nvcr.io/nvidia/tritonserver:24.09-py3

I have also tested with 24.03-py3, the same error occures.

To Reproduce
I run the following code to creat the Torch optimized JIT script:

import torch
import torch_tensorrt
torch.hub._validate_not_a_forked_repo=lambda a,b,c: True

# load model
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True).eval().to("cuda")

model_jit = torch.jit.trace(model, [torch.rand(1,3,224,224).cuda()])
 
# Compile with Torch TensorRT;
trt_model = torch_tensorrt.compile(model_jit,
    inputs= [torch_tensorrt.Input((1, 3, 224, 224))],
    enabled_precisions= { torch.half} # Run with FP32
)

# Save the model
torch.jit.save(trt_model, "model.pt")

I create the model serving folder the same way suggested in the tutorial

model_repository
|
+-- resnet50
    |
    +-- config.pbtxt
    +-- 1
        |
        +-- model.pt

with config file

name: "resnet50"
platform: "pytorch_libtorch"
max_batch_size : 0
input [
  {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ 3, 224, 224 ]
    reshape { shape: [ 1, 3, 224, 224 ] }
  }
]
output [
  {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1, 1000 ,1, 1]
    reshape { shape: [ 1, 1000 ] }
  }
]

I can launch the triton server itself successfully

I1018 03:38:23.657822 1 server.cc:674] 
+----------+---------+--------+
| Model    | Version | Status |
+----------+---------+--------+
| resnet50 | 1       | READY  |
+----------+---------+--------+

I1018 03:38:23.886797 1 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA L4"
I1018 03:38:23.886839 1 metrics.cc:877] "Collecting metrics for GPU 1: NVIDIA L4"
I1018 03:38:23.886852 1 metrics.cc:877] "Collecting metrics for GPU 2: NVIDIA L4"
I1018 03:38:23.886864 1 metrics.cc:877] "Collecting metrics for GPU 3: NVIDIA L4"
I1018 03:38:23.886873 1 metrics.cc:877] "Collecting metrics for GPU 4: NVIDIA L4"
I1018 03:38:23.886882 1 metrics.cc:877] "Collecting metrics for GPU 5: NVIDIA L4"
I1018 03:38:23.886893 1 metrics.cc:877] "Collecting metrics for GPU 6: NVIDIA L4"
I1018 03:38:23.886901 1 metrics.cc:877] "Collecting metrics for GPU 7: NVIDIA L4"
I1018 03:38:23.916949 1 metrics.cc:770] "Collecting CPU metrics"
I1018 03:38:23.917116 1 tritonserver.cc:2598] 

+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.50.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /home/ubuntu/model_repository_4                                                                                                                                                                                 |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{4}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{5}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{6}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{7}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I perform inference in python

import numpy as np
from torchvision import transforms
from PIL import Image
import tritonclient.http as httpclient
from tritonclient.utils import triton_to_np_dtype


inputs = httpclient.InferInput("input__0", (3,224,224), datatype="FP32")
inputs.set_data_from_numpy(np.random.randn(3,224,224), binary_data=True)

outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)


results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
inference_output = results.as_numpy('output__0')
print(inference_output[:5])

And I get error
ERROR: [Torch-TensorRT] - IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (invalid resource handle)

Expected behavior
I'm following the tutorial from https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_triton.html
It looks like the page is updated long time ago...so I don't know if triton is supposed to support torch-tensorrt optimized JIT model.

Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton #7718

[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton #7718

zmy1116 commented Oct 18, 2024

[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton #7718

[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton #7718

Comments

zmy1116 commented Oct 18, 2024