Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large output differences with facebook/bart-base model #3252

Open
Tracked by #3179
chohk88 opened this issue Oct 19, 2024 · 1 comment
Open
Tracked by #3179

Large output differences with facebook/bart-base model #3252

chohk88 opened this issue Oct 19, 2024 · 1 comment
Assignees

Comments

@chohk88
Copy link
Collaborator

chohk88 commented Oct 19, 2024

Significant output differences when compiling and running the facebook/bart-base (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.

Compare the output using the following code:

import torch
from transformers import BartTokenizer, BartModel
import torch_tensorrt

# Set device and backend
backend = "torch_tensorrt"
device = "cuda:0"

# Load tokenizer and model
tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')
model = BartModel.from_pretrained('facebook/bart-base')
model.eval()
model = model.to(device)

# Prepare input
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()} 

# Run inference before Torch-TensorRT
outputs_before = model(**inputs)

# Apply Torch-TensorRT optimization
model = torch.compile(
    model,
    backend=backend,
    options={
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.float16, torch.float32},
    },
    dynamic=False,
)

# Run inference after Torch-TensorRT
outputs_after = model(**inputs)

# Compare outputs
last_hidden_states_before = outputs_before.last_hidden_state
last_hidden_states_after = outputs_after.last_hidden_state

# Calculate the maximum absolute difference
max_diff = torch.max(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the mean absolute difference
mean_abs_diff = torch.mean(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the plain mean of the differences (not absolute)
mean_diff = torch.mean(last_hidden_states_before - last_hidden_states_after).item()

# Print the outputs, max difference, mean absolute difference, and plain mean difference
print("Outputs before Torch-TensorRT:")
print(last_hidden_states_before)
print("\nOutputs after Torch-TensorRT:")
print(last_hidden_states_after)

print(f"\nMaximum absolute difference: {max_diff}")
print(f"Mean absolute difference: {mean_abs_diff}")
print(f"Mean difference: {mean_diff}")

Here are the differences I'm seeing:

  • Maximum absolute difference: 6.1822
  • Mean absolute difference: 0.8487
  • Mean difference: -0.0164

These values are much larger than expected.

Additional Tests

  1. I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:

    model = BartModel.from_pretrained('facebook/bart-base', torch_dtype=torch.float16)
  2. I also enabled "use_fp32_acc" and "use_explicit_typing", but the differences persisted:

    model = torch.compile(
        model,
        backend="torch_tensorrt",
        options={
            "truncate_long_and_double": True,
            "enabled_precisions": {torch.float16, torch.float32},
            "use_fp32_acc": True,
            "use_explicit_typing": True,
        },
        dynamic=False,
    )
@chohk88
Copy link
Collaborator Author

chohk88 commented Oct 19, 2024

@peri044 I tried running the additional tests you suggested, but I’m still seeing large differences in the output, as mentioned above. I would really appreciate it if you could share advice on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant