- Added post-training weight quantization W8A16 algorithm
RoundToNearest
, which realizes the lossless compression parameters of Llama2 7B/13B/70B and Baichuan2 13B networks by over 40%.
- Added
PTQConfig
to configure the post-training quantization algorithm. - Added
PTQMode
enumeration class, which can be configured in 'PTQConfig', is used to distinguish between the two phases of the quantization algorithm: the quantization phase and the deployment phase. - Added
BackendTarget
enumeration class, which can be configured inPTQConfig
, to indicate the backend to which the quantized network will eventually be deployed. For example, 'BackendTarget.Ascend' indicates that it will eventually be deployed to the Ascend backend of MindSpore.
Thanks goes to these wonderful people:
zhuxiaoxiong, hangangqiang
Contributions of any kind are welcome!
- Fixed the problem that SCOP algorithm training fails to converge.
Thanks goes to these wonderful people:
hangangqiang, yangruoqi713, kevinkunkun.
Contributions of any kind are welcome!
- [stable] SLB(Searching for Low-Bit Weights in Quantized Neural Networks)QAT algorithm now support BatchNorm calibration. we can invoke
set_enable_bn_calibration
api to enable BatchNorm calibration. For a network with a BatchNorm layer, the BatchNorm calibration can reduces the decrease in network accuracy caused by the SLB quantization algorithm. (!150) - [stable] We verified the quantization effect of SimQAT(Simulated Quantization Aware Training) algorithm and the SLB algorithm on the ResNet network and the Imagenet2012 dataset. For details, please refer to MindSpore Models readme.
- [stable] SimQAT algorithm now support inference on MindSpore Lite backend. We quant the LeNet network with SimQAT and deploy it on ARM CPU. For details, please refer to Deployment Effect.
- SLB algorithm adds the
set_enable_bn_calibration
interface to enable or disable BatchNorm calibration.(!117) - Add
convert
interface to the algorithm base class, which is configured to convert training network to inferring network. And the network will be exported to MindIR file for Deployment. For details, please refer to Model Deployment.(!176) - Add
set_save_mindir
interface to the algorithm base class, which is configured to automatically export MindIR after training. For details, please refer to Model Deployment.(!168)
- [STABLE] Refactor SimQAT algorithm code, and solve bugs such as activation operator loss, pre-trained parameter loss, simulation quantization operators redundancy, etc.
Thanks goes to these wonderful people:
liuzhicheng01, fuzhongqian, hangangqiang, yangruoqi713, kevinkunkun.
Contributions of any kind are welcome!
- [STABLE] SLB(Searching for Low-Bit Weights in Quantized Neural Networks) QAT algorithm implements a built-in temperature adjustment callback to simplify the use of the algorithm. Users no longer need to manually write the temperature adjustment logic int the training script, and the original temperature adjustment function can be realized through the algorithm configuration interface. Note that this is an incompatible change.
- [STABLE] Solve a bug of AllReduce during distributed training, so that the SLB QAT algorithm can support distributed training.
- Added
callbacks
interface to the algorithm base class, which returns the callback logic of the algorithm which will be called during the training process. In order to facilitate different algorithms to implement their own callback logic, this method has variable parameter inputs.(!117) - SLB algorithm adds the
set_epoch_size
interface, which is used to configure the total number of epochs of training, and is used to implement the temperature adjustment callback logic.(!117) - SLB algorithm adds the
set_has_trained_epoch
interface. If a pre-trained checkpoint is used in training, it is used to configure the number of pre-trained epochs corresponding to the pre-trained checkpoint used in the current training, which is used to implement the temperature adjustment callback logic.(!117) - SLB algorithm adds the
set_t_start_val
interface, which is used to configure the initialization value of the temperature in the temperature adjustment mechanism, and is used to implement the temperature adjustment callback logic.(!117) - SLB algorithm adds the
set_t_start_time
interface, which is used to configure the time when the temperature adjustment mechanism start to work, and is used to implement the temperature adjustment callback logic.(!117) - SLB algorithm adds the
set_t_end_time
interface, which is used to configure the time when the temperature adjustment mechanism stop to work, and is used to implement the temperature adjustment callback logic.(!117) - SLB algorithm adds the
set_t_factor
interface, which is used to configure the temperature adjustment factor in the temperature adjustment mechanism, and is used to implement the temperature adjustment callback logic.(!117)
Thanks goes to these wonderful people:
ghostnet, liuzhicheng01, fuzhongqian, hangangqiang, cjh9368, yangruoqi713, kevinkunkun.
Contributions of any kind are welcome!
MindSpore Golden Stick is a model compression algorithm set jointly designed and developed by Huawei's Noah team and Huawei's MindSpore team. MindSpore Golden Stick provides an unified user interface allowing users to apply model compression algorithms such as quantization and pruning in a unified and convenient manner. MindSpore Golden Stick also provides front-end network modification capabilities to reduce algorithm development costs. MindSpore Golden Stick provides three algorithms in current version.
- [BETA] Provides a quantization aware training algorithm named SimQAT (Simulated Quantization Aware Training), which is the most basic quantization aware training algorithm.
- [BETA] Provides a quantization aware training algorithm called SLB (Searching for Low-Bit Weights in Quantized Neural Networks), which is a nonlinear, high-precision quantization aware training algorithm with obvious advantages in low-bit quantization.
- [STABLE] Provides a pruning algorithm named SCOP (Scientific Control for Reliable Neural Network Pruning), which is a high-precision structured pruning algorithm and is mainly used in CV networks at present.
Thanks goes to these wonderful people:
ghostnet, liuzhicheng01, fuzhongqian, hangangqiang, cjh9368.
Contributions of any kind are welcome!