This repository contains code for performing inference using a YOLO (You Only Look Once) model that has been converted to ONNX (Open Neural Network Exchange) format.
The primary goal of this project is to enable efficient and flexible deployment of YOLO models for object detection tasks, especially on STM32 chips.
- ONNX Model Conversion and Quantization into QInt8
- Export the main core of YOLOv7 into ONNX
- Python Inference engine for a Quantized and Compressed YOLOv7
- C++ Inference engine for a Quantized and Compressed YOLOv7 (soon to come)
- Apply various STM32ai toolbox features on a YOLO ! (soon to come)
- Generate static code C of your model and deploy on low-cost microcontroller (soon to come)
The intended pipeline is as follow : Train your object detector => Export it into ONNX => Quantize / Compress your model using ONNXRUNTIME or STM32ai => Run inference on Python on "large enough" => Deploy into microcontroller
The following table contains a benchmark test using tiny-yolov7 on different devices with certain optimizations TODO
A fine amount of the software can be used for various neural networks as long as the layers are supported by ONNXRUNTIME and STM32ai
This project is licensed under the GNU License.