"Sparse Logits Suffice to Fail Knowledge Distillation"
Haoyu Ma, Yifan Huang, Hao Tang, Chenyu You, Deying Kong, Xiaohui Xie
In ICLR 2022 Workshop PAIR^2Struct
This is an extension of the "nasty teacher" from the ICLR 2021 paper "Undistillable: Making A Nasty Teacher That CANNOT teach students"
We use Pytorch 1.7.1, and CUDA 10.1. You can install them with It should also be applicable to other Pytorch and CUDA versions.
Then install other packages by
pip install -r requirements.txt
For example, normally train a ResNet18 on CIFAR-10
python train_scratch.py --save_path experiments/CIFAR100/baseline/resnet18
After finishing training, you will get training.log
, best_model.tar
in that directory.
Train a ShuffleNet-v2 distilling from a Stingy ResNet18
python train_kd_stingy.py --save_path experiments/CIFAR100/kd_stingy_resnet18/shufflenetv2
If you find our code helps your research, please cite the paper:
@inproceedings{
ma2021undistillable,
title={Undistillable: Making A Nasty Teacher That {\{}CANNOT{\}} teach students},
author={Haoyu Ma and Tianlong Chen and Ting-Kuei Hu and Chenyu You and Xiaohui Xie and Zhangyang Wang},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=0zvfm-nZqQs}
}
@inproceedings{
ma2022sparse,
title={Sparse Logits Suffice to Fail Knowledge Distillation},
author={Haoyu Ma and Yifan Huang and Hao Tang and Chenyu You and Deying Kong and Xiaohui Xie},
booktitle={ICLR 2022 Workshop on PAIR{\textasciicircum}2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data},
year={2022},
url={https://openreview.net/forum?id=BxZgduuNDl5}
}