partial_quantization

triple-Mu

and

Chilicyy

Format code with pre-commit

Oct 12, 2022

ca6ba2e · Oct 12, 2022

History

This branch is 1 commit ahead of, 62 commits behind meituan/YOLOv6:main.

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	Format code with pre-commit	Oct 12, 2022
eval.py	eval.py	shutdown half-precision eval	Sep 22, 2022
eval.yaml	eval.yaml	Format code with pre-commit	Oct 12, 2022
partial_quant.py	partial_quant.py	update sensitivity_analyse.py	Sep 28, 2022
ptq.py	ptq.py	Format code with pre-commit	Oct 12, 2022
sensitivity_analyse.py	sensitivity_analyse.py	update sensitivity_analyse.py	Sep 28, 2022
utils.py	utils.py	update qat and ptq scripts	Sep 4, 2022

README.md

Partial Quantization

The performance of YOLOv6s heavily degrades from 42.4% to 35.6% after traditional PTQ, which is unacceptable. To resolve this issue, we propose partial quantization. First we analyze the quantization sensitivity of all layers, and then we let the most sensitive layers to have full precision as a compromise.

With partial quantization, we finally reach 42.1%, only 0.3% loss in accuracy, while the throughput of the partially quantized model is about 1.56 times that of the FP16 model at a batch size of 32. This method achieves a nice tradeoff between accuracy and throughput.

Prerequirements

pip install --extra-index-url=https://pypi.ngc.nvidia.com --trusted-host pypi.ngc.nvidia.com nvidia-pyindex
pip install --extra-index-url=https://pypi.ngc.nvidia.com --trusted-host pypi.ngc.nvidia.com pytorch_quantization

Sensitivity analysis

Please use the following command to perform sensitivity analysis. Since we randomly sample 128 images from train dataset each time, the sensitivity files will be slightly different.

 python3 sensitivity_analyse.py --weights yolov6s_reopt.pt \
                                --batch-size 32 \
                                --batch-number 4 \
                                --data-root train_data_path