You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Significant output differences when compiling and running the facebook/bart-base (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.
Compare the output using the following code:
importtorchfromtransformersimportBartTokenizer, BartModelimporttorch_tensorrt# Set device and backendbackend="torch_tensorrt"device="cuda:0"# Load tokenizer and modeltokenizer=BartTokenizer.from_pretrained('facebook/bart-base')
model=BartModel.from_pretrained('facebook/bart-base')
model.eval()
model=model.to(device)
# Prepare inputinputs=tokenizer("Hello, my dog is cute", return_tensors="pt")
inputs= {k: v.to(device) fork, vininputs.items()}
# Run inference before Torch-TensorRToutputs_before=model(**inputs)
# Apply Torch-TensorRT optimizationmodel=torch.compile(
model,
backend=backend,
options={
"truncate_long_and_double": True,
"enabled_precisions": {torch.float16, torch.float32},
},
dynamic=False,
)
# Run inference after Torch-TensorRToutputs_after=model(**inputs)
# Compare outputslast_hidden_states_before=outputs_before.last_hidden_statelast_hidden_states_after=outputs_after.last_hidden_state# Calculate the maximum absolute differencemax_diff=torch.max(torch.abs(last_hidden_states_before-last_hidden_states_after)).item()
# Calculate the mean absolute differencemean_abs_diff=torch.mean(torch.abs(last_hidden_states_before-last_hidden_states_after)).item()
# Calculate the plain mean of the differences (not absolute)mean_diff=torch.mean(last_hidden_states_before-last_hidden_states_after).item()
# Print the outputs, max difference, mean absolute difference, and plain mean differenceprint("Outputs before Torch-TensorRT:")
print(last_hidden_states_before)
print("\nOutputs after Torch-TensorRT:")
print(last_hidden_states_after)
print(f"\nMaximum absolute difference: {max_diff}")
print(f"Mean absolute difference: {mean_abs_diff}")
print(f"Mean difference: {mean_diff}")
Here are the differences I'm seeing:
Maximum absolute difference: 6.1822
Mean absolute difference: 0.8487
Mean difference: -0.0164
These values are much larger than expected.
Additional Tests
I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:
@peri044 I tried running the additional tests you suggested, but I’m still seeing large differences in the output, as mentioned above. I would really appreciate it if you could share advice on this issue.
Significant output differences when compiling and running the
facebook/bart-base
(https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.Compare the output using the following code:
Here are the differences I'm seeing:
These values are much larger than expected.
Additional Tests
I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:
I also enabled
"use_fp32_acc"
and"use_explicit_typing"
, but the differences persisted:The text was updated successfully, but these errors were encountered: