You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
I created the same issue at Torch-TensorRT because I don't know which side (triton server or torch-tensorrt) is causing the problem pytorch/TensorRT#3248
When I try to perform inference using Triton server, I got the error ERROR: [Torch-TensorRT] - IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (invalid resource handle)
Triton Information
What version of Triton are you using?
I'm using the triton container and pytorch container on NGC:
Pytorch for model optimiztion: nvcr.io/nvidia/pytorch:24.09-py3
Triton for hosting: nvcr.io/nvidia/tritonserver:24.09-py3
I have also tested with 24.03-py3, the same error occures.
To Reproduce
I run the following code to creat the Torch optimized JIT script:
import torch
import torch_tensorrt
torch.hub._validate_not_a_forked_repo=lambda a,b,c: True
# load model
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True).eval().to("cuda")
model_jit = torch.jit.trace(model, [torch.rand(1,3,224,224).cuda()])
# Compile with Torch TensorRT;
trt_model = torch_tensorrt.compile(model_jit,
inputs= [torch_tensorrt.Input((1, 3, 224, 224))],
enabled_precisions= { torch.half} # Run with FP32
)
# Save the model
torch.jit.save(trt_model, "model.pt")
I create the model serving folder the same way suggested in the tutorial
Description
I created the same issue at Torch-TensorRT because I don't know which side (triton server or torch-tensorrt) is causing the problem
pytorch/TensorRT#3248
I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial
https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_triton.html
When I try to perform inference using Triton server, I got the error
ERROR: [Torch-TensorRT] - IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (invalid resource handle)
Triton Information
What version of Triton are you using?
I'm using the triton container and pytorch container on NGC:
Pytorch for model optimiztion: nvcr.io/nvidia/pytorch:24.09-py3
Triton for hosting: nvcr.io/nvidia/tritonserver:24.09-py3
I have also tested with 24.03-py3, the same error occures.
To Reproduce
I run the following code to creat the Torch optimized JIT script:
I create the model serving folder the same way suggested in the tutorial
with config file
I can launch the triton server itself successfully
I perform inference in python
And I get error
ERROR: [Torch-TensorRT] - IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (invalid resource handle)
Expected behavior
I'm following the tutorial from https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_triton.html
It looks like the page is updated long time ago...so I don't know if triton is supposed to support torch-tensorrt optimized JIT model.
Thanks
The text was updated successfully, but these errors were encountered: