This is own implementation and understanding of the paper Deep Compression developed in June/July 2017 (except huffmann coding =D ), and very simple reports were produced on November. sharing weight stage needs to be optimized, it is extremely slow, but do not require many epochs to converge.
Learn Pytorch for low-level (gradient modification) and high-level implementation (large networks). Learn a very efficient network optimization (2015). Currently, there is a new optimization method that I would like to learn =) MorphNet
All parameters are detailed in main.py
, just run:
python main.py
Recommended number of epochs in pruning is 25, while 5 epochs are enough in the sharing stage.
Trained models were based on VGG19 and its custom versions, however results presented here are VGG19 (with batchnorm) trained on CIFAR10. Experiments were performed again in order to test if pytorch 1.0.1 works correctly on variable and gradient manipulations.
Results reported on VGG19 in this repository using k clusters keep almost the same accuracy same as the original paper argued. All optimization results were trained in an overall of 25 epochs. Pruning iteration number was to set 25 as well (just for convenience).
Netwrok | Original | Pruned 25 | Shared k=4 | Shared k=9 | Shared k=13 | Shared k=35 |
---|---|---|---|---|---|---|
VGG19_BN | 92.22 | 92.18 | 90.93 | 91.86 | 92.23 | (soon =D) |
Some visual results of the pruned weights are shown below:
Usage updated...
Presentation and Figures were uploaded.
Code now work for pytorch 1.0.1. Trained models differ from the original ones reported in the presentation.