This example load an image classification model exported from PyTorch and confirm its accuracy and speed based on ILSVR2012 validation Imagenet dataset. You need to download this dataset yourself.
pip install neural-compressor
pip install -r requirements.txt
Note: Validated ONNX Runtime Version.
Use tf2onnx tool to convert tflite to onnx model.
wget https://github.com/mlcommons/mobile_models/blob/main/v0_7/tflite/mobilenet_edgetpu_224_1.0_float.tflite
python -m tf2onnx.convert --opset 11 --tflite mobilenet_edgetpu_224_1.0_float.tflite --output mobilenet_v3.onnx
Download dataset ILSVR2012 validation Imagenet dataset.
Download label:
wget http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
tar -xvzf caffe_ilsvrc12.tar.gz val.txt
Neural Compressor offers quantization and benchmark diagnosis. Adding diagnosis
parameter to Quantization/Benchmark config will provide additional details useful in diagnostics.
config = PostTrainingQuantConfig(
diagnosis=True,
...
)
config = BenchmarkConfig(
diagnosis=True,
...
)
Quantize model with QLinearOps:
bash run_quant.sh --input_model=path/to/model \ # model path as *.onnx
--dataset_location=/path/to/imagenet \
--label_path=/path/to/val.txt \
--output_model=path/to/save
Quantize model with QDQ mode:
bash run_quant.sh --input_model=path/to/model \ # model path as *.onnx
--dataset_location=/path/to/imagenet \
--label_path=/path/to/val.txt \
--output_model=path/to/save \
--quant_format=QDQ
bash run_benchmark.sh --input_model=path/to/model \ # model path as *.onnx
--dataset_location=/path/to/imagenet \
--label_path=/path/to/val.txt \
--mode=performance # or accuracy