No speed improvement between FP16 and INT8 TensorRT models #13433

ingtommi · 2024-11-27T08:26:11Z

Search before asking

I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation

Bug

When validating my YOLOv5n both in FP16 and INT8 precision I see no performance improvement for the INT8 version, while accuracy and model size drop (which is ok!). I then checked with trtexec and I again get the same latency:
yolov5n.txt.

Since this does not happens for latest YOLOs (where I see around 20% latency improvement), I was thinking that YOLOv5 does not have operations that benefit from INT8 on my current architecture (i.e. 16-bit is already fully optimized).
Can you help me understanding if this is true or I am making any mistake?

Environment

YOLO: YOLOv5n v7.0 fine-tuned on custom dataset
TensorRT: 8.6.2.3
Device: NVIDIA Jetson Orin Nano 8GB

Minimal Reproducible Example

python val.py --weights yolo5n.engine --data data.yaml --batch 16 --task test
python val.py --weights yolo5n-int8.engine --data data.yaml --batch 16 --task test

trtexec --loadEngine=yolo5n.engine --batch=1 --fp16
trtexec --loadEngine=yolo5n-int8.engine --batch=1 --best

Additional

Model files: models.zip

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

UltralyticsAssistant · 2024-11-27T08:27:34Z

👋 Hello @ingtommi, thank you for your interest in YOLOv5 🚀!

It looks like you're encountering an issue with performance differences between FP16 and INT8 TensorRT models. Since this appears to be a 🐛 Bug Report, we would appreciate it if you could provide a minimum reproducible example (MRE) to assist us in debugging this issue. This could include specific commands you used, a small sample of your dataset, or any additional logs that might help clarify the problem.

Please also double-check your environment to ensure compatibility:

Python version is 3.8 or higher.
YOLOv5 dependencies are properly installed using the requirements.txt file.
TensorRT and GPU drivers are updated and configured correctly for inference.

For debugging, it might be helpful to test using different hardware or TensorRT versions to see if the issue persists. If this is related to specific YOLOv5 configurations, please share more details about your setup or the customizations you have made.

An Ultralytics engineer will review this shortly and provide further assistance—thank you for your patience! 😊

pderrenger · 2024-11-27T22:06:55Z

Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs.

If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful.

Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques.

Y-T-G · 2024-11-28T08:46:56Z

YOLOv5 doesn't support INT8 TensorRT exports.

ingtommi · 2024-11-28T08:49:54Z

@Y-T-G I do not use this repo for TensorRT export, I rather convert it to onnx and then use a custom script to build the engines (and calibarte+quantize for INT8). The script is based on TensorRT Python APIs and you can check it here.

Y-T-G · 2024-11-28T08:54:25Z

Does the benchmark with trtexec show a difference?

ingtommi · 2024-11-28T08:56:52Z

@Y-T-G no, you can check it yourself in the txt file I attached above.

Y-T-G · 2024-11-28T09:03:08Z

It's probably not a bug then

ingtommi · 2024-11-28T09:04:38Z

@Y-T-G Yes, but I found no similar thing on the internet (no one comparing YOLOv5 fps fp16-int8) so I had to ask...

Y-T-G · 2024-11-28T09:33:13Z

Someone mentioned there was 10% improvement

https://forums.developer.nvidia.com/t/the-inference-speed-of-yolov5-tensorrt-has-little-difference-between-int8-and-fp16/227183

ingtommi · 2024-11-28T09:39:03Z

@Y-T-G yeah sorry, I also found that one (seems to be the only). 10% is better than my 0%, but he also sees little difference in memory while I move from 6.3 MB (fp16) to 4.7 MB (int8).

ingtommi added the bug Something isn't working label Nov 27, 2024

UltralyticsAssistant added the exports Model exports (ONNX, TensorRT, TFLite, etc.) label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No speed improvement between FP16 and INT8 TensorRT models #13433

No speed improvement between FP16 and INT8 TensorRT models #13433

ingtommi commented Nov 27, 2024 •

edited

Loading

UltralyticsAssistant commented Nov 27, 2024

pderrenger commented Nov 27, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

No speed improvement between FP16 and INT8 TensorRT models #13433

No speed improvement between FP16 and INT8 TensorRT models #13433

Comments

ingtommi commented Nov 27, 2024 • edited Loading

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

UltralyticsAssistant commented Nov 27, 2024

pderrenger commented Nov 27, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

Y-T-G commented Nov 28, 2024

ingtommi commented Nov 28, 2024

ingtommi commented Nov 27, 2024 •

edited

Loading