Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No speed improvement between FP16 and INT8 TensorRT models #13433

Open
1 of 2 tasks
ingtommi opened this issue Nov 27, 2024 · 10 comments
Open
1 of 2 tasks

No speed improvement between FP16 and INT8 TensorRT models #13433

ingtommi opened this issue Nov 27, 2024 · 10 comments
Labels
bug Something isn't working exports Model exports (ONNX, TensorRT, TFLite, etc.)

Comments

@ingtommi
Copy link

ingtommi commented Nov 27, 2024

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation

Bug

yolov5-issue

When validating my YOLOv5n both in FP16 and INT8 precision I see no performance improvement for the INT8 version, while accuracy and model size drop (which is ok!). I then checked with trtexec and I again get the same latency:
yolov5n.txt.

Since this does not happens for latest YOLOs (where I see around 20% latency improvement), I was thinking that YOLOv5 does not have operations that benefit from INT8 on my current architecture (i.e. 16-bit is already fully optimized).
Can you help me understanding if this is true or I am making any mistake?

Environment

  • YOLO: YOLOv5n v7.0 fine-tuned on custom dataset
  • TensorRT: 8.6.2.3
  • Device: NVIDIA Jetson Orin Nano 8GB

Minimal Reproducible Example

python val.py --weights yolo5n.engine --data data.yaml --batch 16 --task test
python val.py --weights yolo5n-int8.engine --data data.yaml --batch 16 --task test

trtexec --loadEngine=yolo5n.engine --batch=1 --fp16
trtexec --loadEngine=yolo5n-int8.engine --batch=1 --best

Additional

Model files: models.zip

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@ingtommi ingtommi added the bug Something isn't working label Nov 27, 2024
@UltralyticsAssistant UltralyticsAssistant added the exports Model exports (ONNX, TensorRT, TFLite, etc.) label Nov 27, 2024
@UltralyticsAssistant
Copy link
Member

👋 Hello @ingtommi, thank you for your interest in YOLOv5 🚀!

It looks like you're encountering an issue with performance differences between FP16 and INT8 TensorRT models. Since this appears to be a 🐛 Bug Report, we would appreciate it if you could provide a minimum reproducible example (MRE) to assist us in debugging this issue. This could include specific commands you used, a small sample of your dataset, or any additional logs that might help clarify the problem.

Please also double-check your environment to ensure compatibility:

  • Python version is 3.8 or higher.
  • YOLOv5 dependencies are properly installed using the requirements.txt file.
  • TensorRT and GPU drivers are updated and configured correctly for inference.

For debugging, it might be helpful to test using different hardware or TensorRT versions to see if the issue persists. If this is related to specific YOLOv5 configurations, please share more details about your setup or the customizations you have made.

An Ultralytics engineer will review this shortly and provide further assistance—thank you for your patience! 😊

@pderrenger
Copy link
Member

Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs.

If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful.

Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques.

@Y-T-G
Copy link

Y-T-G commented Nov 28, 2024

YOLOv5 doesn't support INT8 TensorRT exports.

@ingtommi
Copy link
Author

@Y-T-G I do not use this repo for TensorRT export, I rather convert it to onnx and then use a custom script to build the engines (and calibarte+quantize for INT8). The script is based on TensorRT Python APIs and you can check it here.

@Y-T-G
Copy link

Y-T-G commented Nov 28, 2024

Does the benchmark with trtexec show a difference?

@ingtommi
Copy link
Author

@Y-T-G no, you can check it yourself in the txt file I attached above.

@Y-T-G
Copy link

Y-T-G commented Nov 28, 2024

It's probably not a bug then

@ingtommi
Copy link
Author

@Y-T-G Yes, but I found no similar thing on the internet (no one comparing YOLOv5 fps fp16-int8) so I had to ask...

@Y-T-G
Copy link

Y-T-G commented Nov 28, 2024

@ingtommi
Copy link
Author

@Y-T-G yeah sorry, I also found that one (seems to be the only). 10% is better than my 0%, but he also sees little difference in memory while I move from 6.3 MB (fp16) to 4.7 MB (int8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exports Model exports (ONNX, TensorRT, TFLite, etc.)
Projects
None yet
Development

No branches or pull requests

4 participants