You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to convert both HD and DC trained Vton Unet models to TensorRT to explore possible performance improvement. I was successfully able to convert it to ONNX model first, but when tried to verify the outputs on CPU the ram usage goes as high as 50GB.
The tensor outputs from both pytorch inference and ONNX inference seems more or less same with following difference measured by np.testing.assert_allclose().
With this high memory usage, I cannot convert it to TensorRT with the main culprit being one tensor exceding TensorRT's tensor size limitations. I would assume this tensor is related to spatial_attn_inputs .
Do you think would it be possible to convert into TensorRT or are there any graph-level optimizations we can do to make it possible? Because Pytorch 2.0's inference does seem superior for now.
The text was updated successfully, but these errors were encountered:
No, I arrived on a conclusion that its not possible to convert this model to tensorrt as it goes beyond limits set by Nvidia's tensorrt conversion specifically tensor volume as it mentions in error.
I am trying to convert both HD and DC trained Vton Unet models to TensorRT to explore possible performance improvement. I was successfully able to convert it to ONNX model first, but when tried to verify the outputs on CPU the ram usage goes as high as 50GB.
The tensor outputs from both pytorch inference and ONNX inference seems more or less same with following difference measured by np.testing.assert_allclose().
With this high memory usage, I cannot convert it to TensorRT with the main culprit being one tensor exceding TensorRT's tensor size limitations. I would assume this tensor is related to
spatial_attn_inputs
.Sharing TensorRT converions error logs here.
Do you think would it be possible to convert into TensorRT or are there any graph-level optimizations we can do to make it possible? Because Pytorch 2.0's inference does seem superior for now.
The text was updated successfully, but these errors were encountered: