Skip to content

Latest commit

 

History

History
390 lines (386 loc) · 63.9 KB

README.md

File metadata and controls

390 lines (386 loc) · 63.9 KB

Awesome Knowledge Distillation

Collection of papers on Knowledge Distillation. The PDF of each paper could be obtained by clicking the title.

Paper list

  1. Spatial knowledge distillation to aid visual reasoning. Aditya, S., Saha, R., Yang, Y. & Baral, C. (2019). WACV.
  2. Knowledge distillation from internal representations. Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X. & Guo, E. (2020). AAAI.
  3. Compressing gans using knowledge distillation. Aguinaldo, A., Chiang, P. Y., Gain, A., Patil, A., Pearson, K. & Feizi, S. (2019).
  4. Variational information distillation for knowledge transfer. Ahn, S., Hu, S., Damianou, A., Lawrence, N. D. & Dai, Z. (2019). CVPR.
  5. Emotion recognition in speech using crossmodal transfer in the wild. Albanie, S., Nagrani, A., Vedaldi, A. & Zisserman, A. (2018). ACM MM.
  6. Learning and generalization in overparameterized neural networks going beyond two layers. Allen-Zhu, Z., Li, Y., & Liang, Y. (2019). NeurIPS.
  7. Large scale distributed neural network training through online distillation. Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G. E.. & Hinton, G. E. (2018). ICLR.
  8. On the optimization of deep networks: Implicit acceleration by overparameterization. Arora, S., Cohen, N., & Hazan, E. (2018). ICML.
  9. On knowledge distillation from complex networks for response prediction. Arora, S., Khapra, M. M. & Ramaswamy, H. G. (2019). NAACL-HLT.
  10. Domain adaptation of dnn acoustic models using knowledge distillation. Asami, T., Masumura, R., Yamaguchi, Y., Masataki, H. & Aono, Y. (2017). ICASSP.
  11. N2N learning: Network to network compression via policy gradient reinforcement learning. Ashok, A., Rhinehart, N., Beainy, F. & Kitani, K. M. (2018). ICLR.
  12. Ensemble knowledge distillation for learning improved and efficient networks. Asif, U., Tang, J. & Harrer, S. (2020). ECAI.
  13. Do deep nets really need to be deep?. Ba, J. & Caruana, R. (2014). NeurIPS.
  14. Label refinery: Improving imagenet classification through label progressio. Bagherinezhad, H., Horton, M., Rastegari, M. & Farhadi, A. (2018).
  15. Few shot network compression via cross distillation. Bai, H., Wu, J., King, I. & Lyu, M. (2020). AAAI.
  16. Learn spelling from teachers: transferring knowledge from language models to sequence-to-sequence speech recognition. Bai, Y., Yi, J., Tao, J., Tian, Z. &Wen, Z. (2019). Interspeech.
  17. Teacher guided architecture search. Bashivan, P., Tensen, M. & DiCarlo, J. J. (2019). ICCV.
  18. Adversarial network compression. Belagiannis, V., Farshad, A. & Galasso, F. (2018). ECCV.
  19. Representation learning: A review and new perspectives. Bengio, Y., Courville, A., & Vincent, P. (2013). IEEE TPAMI 35(8): 1798–1828.
  20. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. (2020). CVPR.
  21. Efficient video classification using fewer frames. Bhardwaj, S., Srinivasan, M. & Khapra, M. M. (2019). CVPR.
  22. Distributed Distillation for On-Device Learning. Bistritz, I., Mann, A., & Bambos, N. (2020). NeurIPS.
  23. Flexible Dataset Distillation: Learn Labels Instead of Images. Bohdal, O., Yang, Y., & Hospedales, T. (2020).
  24. Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks. Boo, Y., Shin, S., Choi, J., & Sung, W. (2021). AAAI.
  25. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. Brutzkus, A., & Globerson, A. (2019). ICML.
  26. Model compression. Bucilua, C., Caruana, R. & Niculescu-Mizil, A. (2006). SIGKDD.
  27. Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning. Caccia, M., Rodriguez, P., Ostapenko, O., Normandin, F., Lin, M., Caccia, L., Laradji, I., Rish, I., Lacoste, A., Vazquez D., & Charlin, L. (2020). NeurIPS.
  28. Transferring knowledge from a rnn to a DNN. Chan, W., Ke, N. R. & Lane, I. (2015).
  29. Data-Free Knowledge Distillation for Object Detection. Chawla, A., Yin, H., Molchanov, P., & Alvarez, J. (2021). WACV.
  30. Distilling knowledge from ensembles of neural networks for speech recognition. Chebotar, Y. & Waters, A. (2016). Interspeech.
  31. Online knowledge distillation with diverse peers. Chen, D., Mei, J. P., Wang, C., Feng, Y. & Chen, C. (2020a). AAAI.
  32. Cross-Layer Distillation with Semantic Calibration. Chen, D., Mei, J. P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., & Chen, C. (2021). AAAI.
  33. Learning efficient object detection models with knowledge distillation. Chen, G., Choi, W., Yu, X., Han, T., & Chandraker, M. (2017). NeurIPS.
  34. Data-Free Learning of Student Networks. Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., Xu, C., Xu, C., &Tian, Q. (2019a). ICCV.
  35. Learning student networks via feature embedding. Chen, H., Wang, Y., Xu, C., Xu, C. & Tao, D. (2021). IEEE TNNLS 32(1): 25-35.
  36. Net2Net: ACCELERATING LEARNING VIA KNOWLEDGE TRANSFER. Chen, T., Goodfellow, I. & Shlens, J. (2016). ICLR.
  37. Knowledge distillation with feature maps for image classification. Chen, W. C., Chang, C. C. & Lee, C. R. (2018a). ACCV.
  38. Adversarial distillation for efficient recommendation with external knowledge. Chen, X., Zhang, Y., Xu, H., Qin, Z. & Zha, H. (2018b). ACM TOIS 37(1): 1–28.
  39. A two-teacher tramework for knowledge distillation. Chen, X., Su, J. & Zhang, J. (2019b). ISNN.
  40. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. Chen, Y., Wang, N. & Zhang, Z. (2018c). AAAI.
  41. Distilling knowledge learned in BERT for text generation. Chen, Y. C., Gan, Z., Cheng, Y., Liu, J., & Liu, J. (2020b). ACL.
  42. Crdoco: Pixel-level domain transfer with cross-domain consistency. Chen, Y. C., Lin, Y. Y., Yang, M. H., Huang, J. B. (2019c). CVPR.
  43. Lifelong Machine Learning, Second Edition Synthesis Lectures on Artificial Intelligence and Machine Learning. Chen, Z. & Liu, B. (2018). 12(3): 1–207.
  44. A Multi-task Mean Teacher for Semi-supervised Shadow Detection. Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., & Heng, P. A. (2020c). CVPR.
  45. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. Cheng, Y., Wang, D., Zhou, P. & Zhang, T. (2018). IEEE Signal Proc Mag 35(1): 126–136.
  46. Explaining Knowledge Distillation by Quantifying the Knowledge. Cheng, X., Rao, Z., Chen, Y., & Zhang, Q. (2020). CVPR.
  47. On the efficacy of knowledge distillation. Cho, J. H. & Hariharan, B. (2019). ICCV.
  48. Xception: Deep learning with depthwise separable convolutions. Chollet, F. (2017). CVPR.
  49. Feature-map-level online adversarial knowledge distillation. Chung, I., Park, S., Kim, J. & Kwak, N. (2020). ICML.
  50. Bam! born-again multitask networks for natural language understanding. Clark, K., Luong, M. T., Khandelwal, U., Manning, C. D. & Le, Q. V. (2019). ACL. 51.Binaryconnect: Training deep neural networks with binary weights during propagations. Courbariaux, M., Bengio, Y. & David, J. P. (2015). NeurIPS. 52.Moonshine: Distilling with cheap convolutions. Crowley, E. J., Gray, G. & Storkey, A. J. (2018). NeurIPS.
  51. Knowledge distillation across ensembles of multilingual models or low-resource languages. Cui, J., Kingsbury, B., Ramabhadran, B., Saon, G., Sercu, T., Audhkhasi, K. & et al. (2017).ICASSP.
  52. Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition. Cui, Z., Song, T., Wang, Y., & Ji, Q. (2020). NeurIPS.
  53. Defocus Blur Detection via Depth Distillation. Cun, X., & Pun, C. M. (2020). ECCV.
  54. ImageNet: A large-scale hierarchical image database. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei- Fei, L. (2009). CVPR.
  55. Exploiting linear structure within convolutional networks for efficient evaluation. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y. & Fergus, R. (2014).NeurIPS.
  56. Bert: Pre-training of deep bidirectional transformers for language understanding Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. (2019). NAACL-HLT.
  57. Adaptive regularization of labels Ding, Q., Wu, S., Sun, H., Guo, J. & Xia, ST. (2019).
  58. Compact trilinear interaction for visual question answering Do, T., Do, T. T., Tran, H., Tjiputra, E. & Tran, Q. D. (2019). ICCV.
  59. Teacher supervises students how to learn from partially labeled images for facial landmark detection Dong, X. & Yang, Y. (2019). ICCV.
  60. Unpaired multi-modal segmentation via knowledge distillation Dou, Q., Liu, Q., Heng, P. A., & Glocker, B. (2020). IEEE TMI
  61. Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space Du, S., You, S., Li, X., Wu, J., Wang, F., Qian, C., & Zhang, C. (2020). NeurIPS.
  62. ShrinkTeaNet: Million-scale lightweight face recognition via shrinking teacher-student networks Duong, C. N., Luu, K., Quach, K. G. & Le, N. (2019.)
  63. Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation Fakoor, R., Mueller, J. W., Erickson, N., Chaudhari, P., & Smola, A. J. (2020). NeurIPS.
  64. Transferring knowledge across learning processes Flennerhag, S., Moreno, P. G., Lawrence, N. D. & Damianou, A. (2019). ICLR.
  65. Ensemble distillation for neural machine translation Freitag, M., Al-Onaizan, Y. & Sankaran, B. (2017).
  66. LRC-BERT: Latent representation Contrastive Knowledge Distillation for Natural Language Understanding. Fu, H., Zhou, S., Yang, Q., Tang, J., Liu, G., Liu, K., & Li, X. (2021). AAAI.
  67. Efficient knowledge distillation from an ensemble of teachers. Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J. & Ramabhadran, B. (2017). Interspeech.
  68. Born again neural networks. Furlanello, T., Lipton, Z., Tschannen, M., Itti, L. & Anandkumar, A. (2018). ICML.
  69. An adversarial feature distillation method for audio classification. Gao, L., Mi, H., Zhu, B., Feng, D., Li, Y. & Peng, Y. (2019). IEEE Access 7: 105319–105330.
  70. Residual Error Based Knowledge Distillation. Gao, M., Wang, Y., & Wan, L. (2021). Neurocomputing 433: 154-161.
  71. Privileged modality distillation for vessel border detection in intracoronary imaging. Gao, Z., Chung, J., Abdelrazek, M., Leung, S., Hau,W. K., Xian, Z., Zhang, H., & Li, S. (2020). IEEE TMI 39(5): 1524-1534.
  72. Modality distillation with multiple stream networks for action recognition. Garcia, N. C., Morerio, P. & Murino, V. (2018). ECCV.
  73. Low-resolution face recognition in the wild via selective knowledge distillation. Ge, S., Zhao, S., Li, C. & Li, J. (2018). IEEE TIP 28(4):2051–2062.
  74. Efficient Low-Resolution Face Recognition via Bridge Distillation. Ge, S., Zhao, S., Li, C., Zhang, Y., & Li, J. (2020). IEEE TIP 29: 6898-6908.
  75. Advancing multi-accented lstm-ctc speech recognition using a domain specific student-teacher learning paradigm. Ghorbani, S., Bulut, A. E. & Hansen, J. H. (2018). SLTW.
  76. White-to-black: Efficient distillation of black-box adversarial attacks. Gil, Y., Chai, Y., Gorodissky, O. & Berant, J. (2019). NAACL-HLT.
  77. Adversarially robust distillation. Goldblum, M., Fowl, L., Feizi, S. & Goldstein, T. (2020). AAAI.
  78. Teaching semi-supervised classifier via generalized distillation. Gong, C., Chang, X., Fang, M. & Yang, J. (2018). IJCAI.
  79. Label propagation via teaching-to-learn and learningto-teach. Gong, C., Tao, D., Liu, W., Liu, L., & Yang, J. (2017). TNNLS 28(6): 1452–1465.
  80. Generative adversarial nets. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). NeurIPS.
  81. Explaining sequencelevel knowledge distillation as data-augmentation for neural machine translation. Gordon, M. A. & Duh, K. (2019).
  82. Search for Better Students to Learn Distilled Knowledge. Gu, J., & Tresp, V. (2020). ECAI.
  83. Differentiable Feature Aggregation Search for Knowledge Distillation. Guan, Y., Zhao, P., Wang, B., Zhang, Y., Yao, C., Bian, K., & Tang, J. (2020). ECCV.
  84. Online Knowledge Distillation via Collaborative Learning. Guo, Q., Wang, X., Wu, Y., Yu, Z., Liang, D., Hu, X., & Luo, P. (2020). CVPR.
  85. Cross modal istillation for supervision transfer. Gupta, S., Hoffman, J. & Malik, J. (2016). CVPR.
  86. Self-knowledge distillation in natural language processing. Hahn, S. & Choi, H. (2019). RANLP.
  87. Textkdgan: Text generation using knowledge distillation and generative adversarial networks. Haidar, M. A. & Rezagholizadeh, M. (2019). Canadian Conference on Artificial Intelligence.
  88. Learning both weights and connections for efficient neural network. Han, S., Pool, J., Tran, J. & Dally, W. (2015). NeurIPS.
  89. Spatiotemporal distilled dense-connectivity network for video action recognition. Hao, W. & Zhang, Z. (2019). Pattern Recogn 92: 13–24.
  90. The knowledge within: Methods for data-free model compression. Haroush, M., Hubara, I., Hoffer, E., & Soudry, D. (2020). CVPR.
  91. Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge. He, C., Annavaram, M., & Avestimehr, S. (2020a). NeurIPS.
  92. Why resnet works? residuals generalize. He, F., Liu, T., & Tao, D. (2020b). IEEE TNNLS 31(12): 5349–5362.
  93. Deep residual learning for image recognition. He, K., Zhang, X., Ren, S. & Sun, J. (2016). CVPR.
  94. Knowledge adaptation for efficient semantic segmentation. He, T., Shen, C., Tian, Z., Gong, D., Sun, C. & Yan, Y. (2019). CVPR.
  95. A comprehensive overhaul of feature distillation. Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., & Choi, J. Y. (2019a). ICCV.
  96. Knowledge distillation with adversarial samples supporting decision boundary. Heo, B., Lee, M., Yun, S. & Choi, J. Y. (2019b). AAAI.
  97. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. Heo, B., Lee, M., Yun, S. & Choi, J. Y.(2019c). AAAI.
  98. Distilling the knowledge in a neural network. Hinton, G., Vinyals, O. & Dean, J. (2015).
  99. Learning with Side Information through Modality Hallucination.Hoffman, J., Gupta, S. & Darrell, T. (2016). CVPR.
  100. GAN-Knowledge Distillation for one-stage Object Detection. Hong, W. & Yu, J. (2019).
  101. Learning lightweight lane detection cnns by self attention distillation. Hou, Y., Ma, Z., Liu, C. & Loy, CC. (2019). ICCV.
  102. Inter-Region Affinity Distillation for Road Marking Segmentation. Hou, Y., Ma, Z., Liu, C., Hui, T. W., & Loy, C. C.(2020). CVPR.
  103. Mobilenets: Efficient convolutional neural networks for mobile vision applications. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017).
  104. Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. Hu, H., Xie, L., Hong, R., & Tian, Q. (2020). CVPR.
  105. Attention-guided answer distillation for machine reading comprehension. Hu, M., Peng, Y., Wei, F., Huang, Z., Li, D., Yang, N. & et al. (2018). EMNLP.
  106. Densely connected convolutional networks. Huang, G., Liu, Z., Van, Der Maaten, L. & Weinberger, K. Q. (2017). CVPR.
  107. Knowledge Distillation for Sequence Model. Huang, M., You, Y., Chen, Z., Qian, Y. & Yu, K. (2018). Interspeech.
  108. Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Z. & Wang, N. (2017).
  109. Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection. Huang, Z., Zou, Y., Bhagavatula, V., & Huang, D. (2020). NeurIPS.
  110. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Ioffe, S., & Szegedy, C. (2015). ICML
  111. Learning what and where to transfer. Jang, Y., Lee, H., Hwang, S. J. & Shin, J. (2019). ICML.
  112. Knowledge Distillation in Wide Neural Networks:Risk Bound, Data Efficiency and Imperfect Teacher. Ji, G., & Zhu, Z. (2020). NeurIPS.
  113. Tinybert: Distilling bert for natural language understanding. Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L. & et al. (2020). EMNLP.
  114. Knowledge distillation via route constrained optimization. Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J., Liang, D., Yan, J. & Hu, X. (2019). ICCV.
  115. Towards oracle knowledge distillation with neural architecture search. Kang, M., Mun, J. & Han, B. (2020). AAAI.
  116. Paraphrasing Complex Network: Network Compression via Factor Transfer. Kim, J., Park, S. & Kwak, N. (2018). NeurIPS.
  117. QKD: Quantization-aware Knowledge Distillation. Kim, J., Bhalgat, Y., Lee, J., Patel, C., & Kwak, N. (2019a).
  118. Feature fusion for online mutual knowledge distillation. Kim, J., Hyun, M., Chung, I. & Kwak, N. (2019b). ICPR.
  119. TRANSFERRING KNOWLEDGE TO SMALLER NETWORK WITH CLASS-DISTANCE LOSS.Kim, S. W. & Kim, H. E. (2017). ICLRW.
  120. Sequence-Level Knowledge Distillation. Kim, Y., Rush & A. M. (2016). EMNLP.
  121. Few-shot learning of neural networks from scratch by pseudo example optimization. Kimura, A., Ghahramani, Z., Takeuchi, K., Iwata, T. & Ueda, N. (2018). BMVC.
  122. ADAPTIVE KNOWLEDGE DISTILLATION BASED ON ENTROPY. Kwon, K., Na, H., Lee, H., & Kim, N. S. (2020). ICASSP.
  123. Cross-Resolution Face Recognition via Prior-Aided Face Hallucination and Residual Knowledge Distillation. Kong, H., Zhao, J., Tu, X., Xing, J., Shen, S. & Feng, J. (2019).
  124. Learning multiple layers of features from tiny images.
  125. Imagenet classification with deep convolutional neural networks. Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). NeurIPS.
  126. Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser. Kuncoro, A., Ballesteros, M., Kong, L., Dyer, C. & Smith, N. A. (2016). EMNLP.
  127. Unsupervised multi-task adaptation using adversarial cross-task distillation. Kundu, J. N., Lakkakula, N. & Babu, R. V. (2019). CVPR.
  128. Dual Policy Distillation. Lai, K. H., Zha, D., Li, Y., & Hu, X. (2020). IJCAI.
  129. Self-Referenced Deep Learning. Lan, X., Zhu, X., & Gong, S. (2018). ACCV.
  130. Rethinking data augmentation: Self-supervision and selfdistillation. Lee, H., Hwang, S. J. & Shin, J. (2019a).
  131. Overcoming catastrophic forgetting with unlabeled data in the wild. Lee, K., Lee, K., Shin, J. & Lee, H. (2019b). ICCV.
  132. Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, K., Nguyen, L. T. & Shim, B. (2019c). AAAI.
  133. Graph-based knowledge distillation by multi-head attention network. Lee, S. & Song, B. (2019). BMVC.
  134. Selfsupervised knowledge distillation using singular value decomposition. Lee, S. H., Kim, D. H. & Song, B. C. (2018). ECCV.
  135. Learning Light-Weight Translation Models from Deep Transformer. Li, B., Wang, Z., Liu, H., Du, Q., Xiao, T., Zhang, C., & Zhu, J. (2021). AAAI.
  136. Blockwisely Supervised Neural Architecture Search with Knowledge Distillation.. Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L.,& Chang, X. (2020a). CVPR.
  137. Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts. Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., & Zhang, T. (2020b). NeurIPS.
  138. Spatiotemporal knowledge distillation for efficient estimation of aerial video saliency. Li, J., Fu, K., Zhao, S. & Ge, S. (2019). IEEE TIP 29:1902–1914.
  139. Gan compression: Efficient architectures for interactive conditional gans. Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J. Y., & Han, S.(2020c). CVPR.
  140. Mimicking very efficient network for object detection. Li, Q., Jin, S. & Yan, J. (2017). CVPR.
  141. Few sample knowledge distillation for efficient network compression. Li, T., Li, J., Liu, Z., & Zhang, C. (2020d). CVPR.
  142. Local Correlation Consistency for Knowledge Distillation. Li, X., Wu, J., Fang, H., Liao, Y., Wang, F., & Qian, C. (2020e). ECCV.
  143. Learning without forgetting. Li, Z. & Hoiem, D. (2017).IEEE TPAMI 40(12): 2935–2947.
  144. Ensemble distillation for robust model fusion in federated learning. Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). NeurIPS.
  145. Knowledge flow:Improve upon your teachers. Liu, I. J., Peng, J. & Schwing, A. G. (2019a). ICLR.
  146. Exploiting the ground-truth: An adversarial imitation based knowledge distillation approach for event detection. Liu, J., Chen, Y. & Liu, K. (2019b). AAAI.
  147. Knowledge representing:efficient, sparse representation of prior knowledge for knowledge distillation. Liu, J., Wen, D., Gao, H., Tao, W., Chen, T. W., Osa, K. & et al. (2019c). CVPRW.
  148. DDFlow: Learning optical flow with unlabeled data distillation. Liu, P., King, I., Lyu, M. R., & Xu, J. (2019d). AAAI.
  149. Ktan: knowledge transfer adversarial network. Liu, P., Liu, W., Ma, H., Mei, T. & Seok, M. (2020a). IJCNN.
  150. Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. Liu, Q., Xie, L., Wang, H., Yuille & A. L. (2019e). . ICCV.
  151. Model compression with generative adversarial networks. Liu, R., Fusi, N. & Mackey, L. (2018).
  152. FastBERT: a self-distilling BERT with Adaptive Inference Time. Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., & Ju,Q. (2020b). ACL.
  153. Improving the interpretability of deep neural networks with knowledge distillation. Liu, X., Wang, X. & Matwin, S. (2018b). ICDMW.
  154. Improving multi-task deep neural networks via knowledge distillation for natural language understanding. Liu, X., He, P., Chen, W. & Gao, J. (2019f).
  155. Knowledge distillation via instance relationship graph. Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y. & Duan, Y. (2019g). CVPR.
  156. Structured knowledge distillation for semantic segmentation. Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z. & Wang, J. (2019h). CVPR.
  157. Search to distill: Pearls are everywhere but not the eyes. Liu, Y., Jia, X., Tan, M., Vemulapalli, R., Zhu, Y., Green, B. & et al. (2019i). CVPR.
  158. Adaptive multi-teacher multi-level knowledge distillation. Liu, Y., Zhang, W., & Wang, J. (2020c). Neu-rocomputing 415: 106-113.
  159. Data-free knowledge distillation for deep neural networks. Lopes, R. G., Fenu, S. & Starner, T. (2017). NeurIPS.
  160. Unifying distillation and privileged information. Lopez-Paz, D., Bottou, L., Sch¨olkopf, B. & Vapnik, V. (2016). ICLR.
  161. Knowledge distillation for small-footprint highway networks. Lu, L., Guo, M. & Renals, S. (2017). ICASSP.
  162. Face model compression by distilling knowledge from neurons. Luo, P., Zhu, Z., Liu, Z., Wang, X. & Tang, X. (2016). AAAI.
  163. Collaboration by Competition: Selfcoordinated Knowledge Amalgamation for Multitalent Student Learning. Luo, S., Pan, W., Wang, X., Wang, D., Tang, H., & Song, M. (2020). ECCV.
  164. Knowledge amalgamation from heterogeneous networks by common feature learning. Luo, S., Wang, X., Fang, G., Hu, Y., Tao, D., & Song, M. (2019). IJCAI.
  165. Graph distillation for action detection with privileged modalities. Luo, Z., Hsieh, J. T., Jiang, L., Carlos Niebles, J.& Fei- Fei, L. (2018). ECCV.
  166. Improving neural architecture search image classifiers via ensemble learning. Macko, V., Weill, C., Mazzawi, H. & Gonzalvo, J. (2019). NeurIPS Workshop.
  167. Graph representation learning via multi-task knowledge distillation. Ma, J., & Mei, Q. (2019).
  168. Shufflenet v2: Practical guidelines for efficient cnn architecture design. Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). ECCV
  169. Conditional teacher-student learning. Meng, Z., Li, J., Zhao, Y. & Gong, Y. (2019). ICASSP.
  170. Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, P. & Storkey, A. J. (2019). NeurIPS.
  171. Knowledge transfer graph for deep collaborative learning. Minami, S., Hirakawa, T., Yamashita, T. & Fujiyoshi, H. (2019).
  172. Improved knowledge distillation via teacher assistant. Mirzadeh, S. I., Farajtabar,M., Li, A. & Ghasemzadeh, H. (2020). AAAI.
  173. Apprentice: Using knowledge distillation techniques to improve lowprecision network accuracy. Mishra, A. & Marr, D. (2018). ICLR.
  174. Self-distillation amplifies regularization in hilbert space. Mobahi, H., Farajtabar, M., & Bartlett, P. L. (2020). NeurIPS.
  175. Distilling word embeddings: An encoding approach. Mou, L., Jia, R., Xu, Y., Li, G., Zhang, L. & Jin, Z. (2016). CIKM.
  176. Cogni-net: Cognitive feature learning through deep visual perception. Mukherjee, P., Das, A., Bhunia, A. K. & Roy, P. P. (2019). ICIP.
  177. Online model distillation for efficient video inference. Mullapudi, R. T., Chen, S., Zhang, K., Ramanan, D. & Fatahalian, K. (2019). ICCV.
  178. When does label smoothing help?. Muller, R., Kornblith, S. & Hinton, G. E. (2019). NeurIPS.
  179. Learning to specialize with knowledge distillation for visual question answering. Mun, J., Lee, K., Shin, J. & Han, B. (2018). NeurIPS.
  180. Knowledge distillation for end-to-end person search. Munjal, B., Galasso, F. & Amin, S. (2019). BMVC.
  181. Knowledge Distillation for Bilingual Dictionary Induction. Nakashole, N. & Flauger, R. (2017). EMNLP.
  182. Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation. Nayak, G. K.,Mopuri, K. R., & Chakraborty, A. (2021). WACV.
  183. Zero-shot knowledge distillation in deep networks. Nayak, G. K., Mopuri, K. R., Shaj, V., Babu, R. V. & Chakraborty, A. (2019). ICML.
  184. Teacherstudent training for text-independent speaker recognition. Ng, R. W., Liu, X. & Swietojanski, P. (2018). SLTW.
  185. Dynamic kernel distillation for efficient pose estimation in videos. Nie, X., Li, Y., Luo, L., Zhang, N. & Feng, J. (2019). ICCV.
  186. Boosting self-supervised learning via knowledge transfer. Noroozi, M., Vinjimoor, A., Favaro, P. & Pirsiavash, H. (2018). CVPR.
  187. Deep net triage: Analyzing the importance of network layers via structural compression. Nowak, T. S. & Corso, J. J. (2018).
  188. Parallel wavenet: Fast high-fidelity speech synthesis. Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K. & et al. (2018). ICML.
  189. Spatio-Temporal Graph for Video Captioning with Knowledge Distillation. Pan, B., Cai, H., Huang, D. A., Lee, K. H., Gaidon, A., Adeli, E., & Niebles, J. C. (2020). CVPR
  190. A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Pan, Y., He, F. & Yu, H. (2019). Neurocomputing 332: 137–148.
  191. Semi-supervised knowledge transfer for deep learning from private training data. Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I. & Talwar, K. (2017). ICLR
  192. Distillation as a defense to adversarial perturbations against deep neural networks. Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016). IEEE SP.
  193. Feature-level Ensemble Knowledge Distillation for Aggregating Knowledge from Multiple Networks. Park, S. & Kwak, N. (2020). ECAI.
  194. Relational knowledge distillation. Park,W., Kim, D., Lu, Y. & Cho, M. (2019). CVPR.
  195. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation. Passban, P., Wu, Y., Rezagholizadeh, M., & Liu, Q. (2021). AAAI.
  196. Learning deep representations with probabilistic knowledge transfer. Passalis, N. & Tefas, A. (2018). ECCV.
  197. Probabilistic Knowledge Transfer for Lightweight Deep Representation Learning. Passalis, N., Tzelepi, M., & Tefas, A. (2020a).TNNLS.
  198. Heterogeneous Knowledge Distillation using Information Flow Modeling. Passalis, N., Tzelepi,M., & Tefas, A. (2020b). CVPR.
  199. Correlation congruence for knowledge distillation. Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y. & et al. (2019a). ICCV.
  200. Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search. Peng, H., Du, H., Yu, H., Li, Q., Liao, J., & Fu, J. (2020). NeurIPS.
  201. Few-shot image recognition with knowledge transfer. Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G. J. & Tang, J. (2019b). ICCV.
  202. Audio-visual model distillation using acoustic images. Perez, A., Sanguineti, V., Morerio, P. & Murino, V. (2020). Audio-visual model distillation using acoustic images. WACV.
  203. Towards understanding knowledge distillation. Phuong, M. & Lampert, C. H. (2019a). ICML.
  204. Distillationbased training for multi-exit architectures. Phuong, M., & Lampert, C. H. (2019b). ICCV.
  205. Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. Pilzer, A., Lathuiliere, S., Sebe, N. & Ricci, E. (2019). CVPR.
  206. Model compression via distillation and quantization. Polino, A., Pascanu, R. & Alistarh, D. (2018). ICLR.
  207. Wise teachers train better dnn acoustic models. Price, R., Iso, K. & Shinoda, K. (2016). . EURASIP Journal on Audio, Speech, and Music Processing 2016(1):10.
  208. Data distillation: Towards omnisupervised learning. Radosavovic, I., Dollar, P., Girshick, R., Gkioxari, G., & He, K. (2018). CVPR.
  209. Designing network design spaces. Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollar P. (2020). CVPR.
  210. Cross-modality distillation: A case for conditional generative adversarial networks. Roheda, S., Riggan, B. S., Krim, H. & Dai, L. (2018). ICASSP.
  211. Fitnets: Hints for thin deep nets. Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2015). ICLR.
  212. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. Ross, A. S. & Doshi-Velez, F. (2018). AAAI.
  213. Knowledge adaptation: Teaching to adapt. Ruder, S., Ghaffari, P. & Breslin, J. G. (2017).
  214. Mobilenetv2: Inverted residuals and linear bottlenecks. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). CVPR.
  215. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. Sanh, V., Debut, L., Chaumond, J. & Wolf, T. (2019).
  216. Distilling knowledge from a deep pose regressor network. Saputra, M. R. U., de Gusmao, P. P., Almalioglu, Y., Markham, A. & Trigoni, N. (2019). ICCV.
  217. Deep model compression: Distilling knowledge from noisy teachers. Sau, B. B. & Balasubramanian, V. N. (2016).
  218. Federated Knowledge Distillation. Seo, H., Park, J., Oh, S., Bennis, M., & Kim, S. L. (2020).
  219. Knowledge distillation in document retrieval. Shakeri, S., Sethy, A. & Cheng, C. (2019).
  220. Amalgamating knowledge towards comprehensive classification. Shen, C., Wang, X., Song, J., Sun, L., & Song, M. (2019a). AAAI.
  221. Progressive Network Grafting for Few-Shot Knowledge Distillation. Shen, C., Wang, X., Yin, Y., Song, J., Luo, S., & Song, M. (2021). AAAI.
  222. Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. Shen, C., Xue, M., Wang, X., Song, J., Sun, L., & Song, M. (2019b). ICCV.
  223. In teacher we trust: Learning compressed models for pedestrian detection. Shen, J., Vesdapunt, N., Boddeti, V. N. & Kitani, K. M. (2016).
  224. Feature representation of short utterances based on knowledge distillation for spoken language identification. Shen, P., Lu, X., Li, S. & Kawai, H. (2018). Interspeech.
  225. Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification. Shen, P., Lu, X., Li, S., & Kawai, H. (2020). IEEE/ACM T AUDIO SPE 28: 2674-2683.
  226. Interactive learning of teacher-student model for short utterance spoken language identification. Shen, P., Lu, X., Li, S. & Kawai, H. (2019c). ICASSP.
  227. Meal: Multi-model ensemble via adversarial learning. Shen, Z., He, Z. & Xue, X. (2019d). AAAI.
  228. Compression of acoustic event detection models with quantized distillation. Shi, B., Sun, M., Kao, C. C., Rozgic, V., Matsoukas, S. & Wang, C. (2019a). Interspeech.
  229. Semi-supervised acoustic event detection based on tri-training. Shi, B., Sun, M., Kao, CC., Rozgic, V., Matsoukas, S. & Wang, C. (2019b). ICASSP.
  230. Knowledge distillation for recurrent neural network language modeling with trust regularization. Shi, Y., Hwang, M. Y., Lei, X. & Sheng, H. (2019c). ICASSP.
  231. Empirical analysis of knowledge distillation technique for optimization of quantized deep neural networks. Shin, S., Boo, Y. & Sung,W. (2019).
  232. Incremental learning of object detectors without catastrophic forgetting. Shmelkov, K., Schmid, C. & Alahari, K. (2017). ICCV.
  233. Knowledge squeezed adversarial network compression. Shu, C., Li, P., Xie, Y., Qu, Y., Dai, L., & Ma, L.(2019).
  234. Video object segmentation using teacher-student adaptation in a human robot interaction (hri) setting. Siam, M., Jiang, C., Lu, S., Petrich, L., Gamal, M., Elhoseiny, M. & et al. (2019). ICRA.
  235. Structured transforms for small-footprint deep learning. Sindhwani, V., Sainath, T. & Kumar, S. (2015). NeurIPS.
  236. Mastering the game of Go with deep neural networks and tree search. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Nature, 529(7587): 484–489.
  237. Neural compatibility modeling with attentive knowledge distillation. Song, X., Feng, F., Han, X., Yang, X., Liu,W. & Nie, L. (2018). SIGIR.
  238. Knowledge transfer with jacobian matching. Srinivas, S. & Fleuret, F. (2018). ICML.
  239. Adapting models to signal degradation using distillation. Su, J. C. & Maji, S. (2017). BMVC.
  240. Collaborative Teacher-Student Learning via Multiple Knowledge Transfer. Sun, L., Gou, J., Du, L., & Tao, D. (2021)
  241. Patient knowledge distillation for bert model compression. Sun, S., Cheng, Y., Gan, Z. & Liu, J. (2019). NEMNLP-IJCNLP.
  242. Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes. Sun, P., Feng, W., Han, R., Yan, S., & Wen, Y. (2019).
  243. An investigation of a knowledge distillation method for ctc acoustic models. Takashima, R., Li, S. & Kawai, H. (2018). ICASSP.
  244. Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis. Tan, H., Liu, X., Liu, M., Yin, B., & Li, X. (2021). KTGAN:. IEEE TIP 30: 1275-1290.
  245. Mnasnet: Platform-aware neural architecture search for mobile. Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). CVPR.
  246. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Tan, M., & Le, Q. (2019). ICML.
  247. Multilingual neural machine translation with knowledge distillation. Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z. & Liu, T. Y. (2019). ICLR.
  248. Understanding and Improving Knowledge Distillation. Tang, J., Shivanna, R., Zhao, Z., Lin, D., Singh, A., Chi, E. H., & Jain, S. (2020).
  249. Ranking distillation: Learning compact ranking models with high performance for recommender system. Tang, J. & Wang, K. (2018). SIGKDD.
  250. Distilling task-specific knowledge from bert into simple neural networks. Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O. & Lin, J. (2019).
  251. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, A., & Valpola, H. (2017). NeurIPS.
  252. Cross-modal knowledge distillation for action recognition. Thoker, F. M. & Gall, J. (2019). ICIP.
  253. Contrastive representation distillation. Tian, Y., Krishnan, D. & Isola, P. (2020). ICLR.
  254. Understanding Generalization in Recurrent Neural Networks. Tu, Z., He, F., & Tao, D. (2020). ICLR.
  255. Similarity-preserving knowledge distillation. Tung, F. & Mori, G. (2019). ICCV.
  256. Well-read students learn better: The impact of student initialization on knowledge distillation. Turc, I., Chang, M. W., Lee, K. & Toutanova, K.(2019).
  257. Access to unlabeled data can speed up prediction time. Urner, R., Shalev-Shwartz, S., Ben-David, S. (2011). ICML.
  258. Do deep convolutional nets really need to be deep and convolutional?. Urban, G., Geras, K. J., Kahou, S. E., Aslan, O., Wang, S., Caruana, R. & et al. (2017). ICLR.
  259. Learning using privileged information: similarity control and knowledge transfer. Vapnik, V. & Izmailov, R. (2015). J Mach Learn Res 16(1): 2023-2049.
  260. Unifying heterogeneous classifiers with distillation. Vongkulbhisal, J., Vinayavekhin, P. & Visentini-Scarzanella, M. (2019). CVPR.
  261. Online Ensemble Model Compression using Knowledge Distillation. Walawalkar, D., Shen, Z., & Savvides, M. (2020). ECCV.
  262. Model distillation with knowledge transfer from face classification to alignment and verification. Wang, C., Lan, X. & Zhang, Y. (2017).
  263. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. Wang, L., & Yoon, K. J. (2020).
  264. Progressive blockwise knowledge distillation for neural network acceleration. Wang, H., Zhao, H., Li, X. & Tan, X. (2018a). IJCAI.
  265. Private model compression via knowledge distillation. Wang, J., Bao, W., Sun, L., Zhu, X., Cao, B. & Philip, SY. (2019a). AAAI.
  266. Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. Wang, J., Gou, L., Zhang, W., Yang, H. & Shen, H. W. (2019b). TVCG 25(6): 2168-2180
  267. Discover the effective strategy for face recognition model compression by improved knowledge distillation. Wang, M., Liu, R., Abe, N., Uchida, H., Matsunami, T. & Yamada, S. (2018b). ICIP.
  268. Improved knowledge distillation for training fast low resolution face recognition model. Wang, M., Liu, R., Hajime, N., Narishige, A., Uchida, H. & Matsunami, T.(2019c). ICCVW.
  269. Distilling Object Detectors with Fine-grained Feature Imitation. Wang, T., Yuan, L., Zhang, X. & Feng, J. (2019d). CVPR.
  270. Dataset distillation. Wang, T., Zhu, J. Y., Torralba, A., & Efros, A. A. (2018c).
  271. Minilm: Deep self-attention distillation for task-agnostic compression of pretrained transformers. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020a). NeurIPS.
  272. A teacher-student framework for maintainable dialog manager. Wang, W., Zhang, J., Zhang, H., Hwang, M. Y., Zong, C. & Li, Z. (2018d). EMNLP.
  273. Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition. Wang, X., Fu, T., Liao, S., Wang, S., Lei, Z., & Mei, T. (2020b). ECCV.
  274. Progressive teacher-student learning for early action prediction. Wang, X., Hu, J. F., Lai, J. H., Zhang, J. & Zheng, W. S. (2019e). CVPR.
  275. Kdgan: Knowledge distillation with generative adversarial networks. Wang, X., Zhang, R., Sun, Y. & Qi, J. (2018e). NeurIPS.
  276. Packing convolutional neural networks in the frequency domain. Wang, Y., Xu, C., Xu, C. & Tao, D. (2019f). IEEE TPAMI 41(10): 2495–2510.
  277. Adversarial learning of portable student networks. Wang, Y., Xu, C., Xu, C. & Tao, D. (2018f). AAAI.
  278. Joint architecture and knowledge distillation in CNN for Chinese text recognition. Wang, Z. R., & Du, J. (2021). Pattern Recognition 111: 107722.
  279. Student-teacher network learning with enhanced features. Watanabe, S., Hori, T., Le Roux, J. & Hershey, J. R. (2017). ICASSP.
  280. Online distilling from checkpoints for neural machine translation. Wei, H. R., Huang, S., Wang, R., Dai, X. & Chen, J. (2019). NAACL-HLT.
  281. Quantizationmimic: Towards very tiny cnn for object detection. Wei, Y., Pan, X., Qin, H., Ouyang,W. & Yan, J. (2018). ECCV.
  282. Sequence studentteacher training of deep neural networks. Wong, J. H. & Gales, M. (2016). Interspeech.
  283. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., ... & Keutzer, K. (2019). CVPR.
  284. Distilled person re-identification: Towards a more scalable system. Wu, A., Zheng, W. S., Guo, X. & Lai, J. H. (2019a). CVPR.
  285. Peer Collaborative Learning for Online Knowledge Distillation. Wu, G., & Gong, S. (2021). AAAI.
  286. Quantized convolutional neural networks for mobile devices. Wu, J., Leng, C., Wang, Y., Hu, Q. & Cheng, J. (2016). CVPR.
  287. Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. Wu, M. C., Chiu, C. T. & Wu, K. H. (2019b). ICASSP.
  288. Learning an evolutionary embedding via massive knowledge distillation. Wu, X., He, R., Hu, Y., & Sun, Z. (2020). International Journal of Computer Vision, 1-18.
  289. Complete random forest based class noise filtering learning for improving the generalizability of classifiers. Xia, S., Wang, G., Chen, Z., & Duan, Y. (2018). IEEE TKDE 31(11): 2063-2078.
  290. Training convolutional neural networks with cheap convolutions and online distillation. Xie, J., Lin, S., Zhang, Y. & Luo, L. (2019).
  291. Self-training with Noisy Student improves ImageNet classification. Xie, Q., Hovy, E., Luong, M. T., & Le, Q. V. (2020). CVPR.
  292. Knowledge Distillation Meets Self-Supervision. Xu, G., Liu, Z., Li, X., & Loy, C. C. (2020a). ECCV.
  293. Feature Normalized Knowledge Distillation for Image Classification. Xu, K., Rui, L., Li, Y., & Gu, L. (2020b). ECCV.
  294. Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control. Xu, Z., Wu, K., Che, Z., Tang, J., & Ye, J. (2020c). NeurIPS.
  295. Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. Xu, Z., Hsu, Y. C. & Huang, J. (2018a). ICLR Workshop.
  296. Data-distortion guided self-distillation for deep neural networks. Xu, Z., Hsu, Y. C. & Huang, J. (2018b). BMVC.
  297. Data-distortion guided self-distillation for deep neural networks. Xu, T. B., & Liu, C. L. (2019). AAAI.
  298. Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition. Yan, M., Zhao, M., Xu, Z., Zhang, Q., Wang, G. & Su, Z. (2019). ICCVW.
  299. Knowledge distillation in generations: More tolerant teachers educate better students. Yang, C., Xie, L., Qiao, S. & Yuille, A. (2019a). AAAI.
  300. Snapshot distillation: Teacher-student optimization in one generation. Yang, C., Xie, L., Su, C. & Yuille, A. L. (2019b). CVPR.
  301. Knowledge distillation via adaptive instance normalization. Yang, J., Martinez, B., Bulat, A., & Tzimiropoulos, G. (2020a). ECCV.
  302. Distilling Knowledge From Graph Convolutional Networks. Yang, Y., Qiu, J., Song, M., Tao, D. & Wang, X. (2020b). CVPR.
  303. TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. Yang, Z., Cui, Y., Chen, Z., Che, W., Liu, T., Wang, S., & Hu, G. (2020c). ACL.
  304. Model compression with two-stage multiteacher knowledge distillation for web question answering system. Yang, Z., Shou, L., Gong, M., Lin, W. & Jiang, D. (2020d). WSDM.
  305. Knowledge Transfer via Dense Cross-Layer Mutual-Distillation. Yao, A., & Sun, D. (2020). ECCV.
  306. Graph Few-shot Learning via Knowledge Transfer. Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N. V., & Li, Z. (2020). AAAI.
  307. Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). CVPR.
  308. Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. Ye, J., Ji, Y., Wang, X., Ou, K., Tao, D. & Song, M. (2019). CVPR.
  309. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. Yim, J., Joo, D., Bae, J. & Kim, J. (2017). CVPR.
  310. Dreaming to distill: Data-free knowledge transfer via DeepInversion. Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., Jha, Niraj K., & Kautz, J. (2020). CVPR.
  311. Knowledge extraction with no observable data. Yoo, J., Cho, M., Kim, T., & Kang, U. (2019). NeurIPS.
  312. Learning from multiple teacher networks. You, S., Xu, C., Xu, C. & Tao, D. (2017). SIGKDD.
  313. Learning with single-teacher multi-student. You, S., Xu, C., Xu, C. & Tao, D. (2018). AAAI.
  314. Large batch optimization for deep learning: Training bert in 76 minutes. You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., ... & Hsieh, C. J. (2019). ICLR.
  315. Learning metrics from teachers: Compact networks for image embedding. Yu, L., Yazici, V. O., Liu, X., Weijer, J., Cheng, Y. & Ramisa, A. (2019). CVPR.
  316. On compressing deep models by low rank and sparse decomposition. Yu, X., Liu, T., Wang, X., & Tao, D. (2017). CVPR.
  317. Reinforced Multi-Teacher Selection for Knowledge Distillation. Yuan, F., Shou, L., Pei, J., Lin,W., Gong,M., Fu, Y., & Jiang, D. (2021). AAAI.
  318. Revisit knowledge distillation: a teacher-free framework. Yuan, L., Tay, F. E., Li, G., Wang, T. & Feng, J. (2020). CVPR.
  319. CKD: Cross-task knowledge distillation for text-to-image synthesis. Yuan, M., & Peng, Y. (2020). IEEE TMM 22(8): 1955-1968.
  320. Matching Guided Distillation. Yue, K., Deng, J., & Zhou, F. (2020). ECCV.
  321. Regularizing Class-wise Predictions via Self-knowledge Distillation. Yun, S., Park, J., Lee, K. & Shin, J. (2020). CVPR.
  322. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko, S. & Komodakis, N. (2017). ICLR.
  323. Lifelong gan: Continual learning for conditional image generation. Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M. & Mori, G. (2019). ICCV.
  324. Doubly convolutional neural networks. Zhai, S., Cheng, Y., Zhang, Z. M. & Lu, W. (2016). NeurIPS.
  325. Robust Domain Randomised Reinforcement Learning through Peerto-Peer Distillation. Zhao, C., & Hospedales, T. (2020). NeurIPS.
  326. Highlight every step: Knowledge distillation via collaborative teaching. Zhao, H., Sun, X., Dong, J., Chen, C., & Dong, Z. (2020a). . IEEE TCYB.
  327. Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge. Zhao, L., Peng, X., Chen, Y., Kapadia, M., & Metaxas, D. N. (2020b).
  328. Throughwall human pose estimation using radio signals. Zhao, M., Li, T., Abu Alsh]eikh, M., Tian, Y., Zhao, H., Torralba, A. & Katabi, D. (2018). CVPR.
  329. Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. Zhang, C. & Peng, Y. (2018). IJCAI.
  330. Fast human pose estimation. Zhang, F., Zhu, X. & Ye, M. (2019a). CVPR.
  331. An informationtheoretic view for deep learning. Zhang, J., Liu, T., & Tao, D. (2018).
  332. Adversarial co-distillation learning for image recognition. Zhang, H., Hu, Z., Qin, W., Xu, M., & Wang, M. (2021a). Pattern Recognition 111: 107659.
  333. Task-Oriented Feature Distillation. Zhang, L., Shi, Y., Shi, Z., Ma, K., & Bao, C. (2020a). NeurIPS.
  334. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C. & Ma, K. (2019b). ICCV.
  335. Discriminability distillation in group representation learning. Zhang, M., Song, G., Zhou, H., & Liu, Y. (2020b). ECCV.
  336. Future-Guided Incremental Transformer for Simultaneous Translation. Zhang, S., Feng, Y., & Li, L. (2021b). AAAI.
  337. Knowledge Integration Networks for Action Recognition. Zhang, S., Guo, S., Wang, L., Huang, W., & Scott, M. R. (2020c). AAAI.
  338. Reliable Data Distillation on Graph Convolutional Network. Zhang, W., Miao, X., Shao, Y., Jiang, J., Chen, L., Ruas, O., & Cui, B. (2020d). ACM SIGMOD.
  339. Diverse Knowledge Distillation for End-to-End Person Search. Zhang, X., Wang, X., Bian, J. W., Shen, C., & You, M. (2021c). AAAI.
  340. Shufflenet: An extremely efficient convolutional neural network for mobile devices. Zhang, X., Zhou, X., Lin, M. & Sun, J. (2018a). CVPR.
  341. Prime-Aware Adaptive Distillation. Zhang, Y., Lan, Z., Dai, Y., Zeng, F., Bai, Y., Chang, J., &Wei, Y. (2020e). ECCV.
  342. Deep mutual learning. Zhang, Y., Xiang, T., Hospedales, T. M. & Lu, H. (2018b). CVPR.
  343. Self-Distillation as Instance-Specific Label Smoothing. Zhang, Z., & Sabuncu, M. R. (2020). NeurIPS.
  344. Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Zhang, Z., Shi, Y., Yuan, C., Li, B., Wang, P., Hu, W., & Zha, Z. J. (2020f). CVPR.
  345. Understanding knowledge distillation in non-autoregressive machine translation. Zhou C, Neubig G, Gu J (2019a). ICLR.
  346. Rocket launching: A universal and efficient framework for training well-performing light net. Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X. & Gai, K. (2018). AAAI.
  347. Two-stage image classification supervised by a single teacher single student model. Zhou, J., Zeng, S. & Zhang, B. (2019b). BMVC.
  348. M2KD: Incremental Learning via Multi-model and Multi-level Knowledge Distillation. Zhou, P., Mai, L., Zhang, J., Xu, N., Wu, Z. & Davis, L. S. (2020). BMVC.
  349. Low-resolution visual recognition via deep feature distillation. Zhu,M., Han, K., Zhang, C., Lin, J. &Wang, Y. (2019). ICASSP.

Acknowledgement:

Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge Distillation: A Survey IJCV, 129(6), 1789-1819.