- Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. Deep learning. An MIT Press book. (2015). pdf
- LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature 521.7553 (2015): 436-444.pdf ️️️️️
- Y. LeCun, L. Bottou, Y. Bengio and P. Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11):2278-2324. 1998. pdf (Seminal Paper: LeNet)
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012. pdf
- Simonyan, Karen, and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). pdf
- Szegedy, Christian, et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. pdf
- He, Kaiming, et al. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015). pdf ResNet
- Huang, G. et al. Densely Connected Convolutional Networks. arXiv preprint arXiv:1608.06993 (2017) pdf (DenseNet)
- Hu, Jie et al. Squeeze-and-Excitation Networks. arXiv preprint arXiv:1709.01507 (2017) pdf
- Howard, A. G. et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. pdf]
- Tan, M. and Le, Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. pdf
- Xie, Q. et al. Self-training with Noisy Student improves ImageNet classification. pdf
- Bojarski, M. et al. End to End Learning for Self-Driving Cars. pdf
- H. A. Rowley, S. Baluja, and T. Kanade, Neural network-based face detection, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, pp. 203–208, 1996. pdf
- P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pdf
- Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. Deep neural networks for object detection. Advances in Neural Information Processing Systems. 2013. pdf
- Girshick, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. pdf RCNN
- He, Kaiming, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. European Conference on Computer Vision. Springer International Publishing, 2014. pdf SPPNet
- Girshick, Ross. Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision. 2015. pdf️️️️
- Ren, Shaoqing, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015. pdf ️️️️
- Redmon, Joseph, et al. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).pdf
- Liu, Wei, et al. SSD: Single Shot MultiBox Detector. arXiv preprint arXiv:1512.02325 (2015). pdf
- Dai, Jifeng, et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv preprint arXiv:1605.06409 (2016).pdf
- K. He et al. Mask R-CNN arXiv preprint arXiv:1703.06870 (2017). pdf
- Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. arXiv:1612.03144 (2017). pdf
- Esteban Real, Alok Aggarwal, Yanping Huang: Regularized Evolution for Image Classifier Architecture Search, 2018; arXiv:1802.01548 pdf
- Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang: NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, 2019; arXiv:1904.07392 pdf
- Chenchen Zhu, Yihui He: Feature Selective Anchor-Free Module for Single-Shot Object Detection, 2019; arXiv:1903.00621 pdf
- Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao: DetNAS: Backbone Search for Object Detection, 2019; arXiv:1903.10979 pdf
- Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang: CenterNet: Keypoint Triplets for Object Detection, 2019; arXiv:1904.08189 pdf
- Mingxing Tan, Ruoming Pang: EfficientDet: Scalable and Efficient Object Detection, 2019; arXiv:1911.09070 pdf
Segmentation:
- J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation. in CVPR, 2015. pdf
- O. Ronnenberger et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. 2015. pdf
- Multi-Scale Context Aggregation by Dilated Convolutions. 2016. pdf
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. 2016. pdf
- Rethinking Atrous Convolution for Semantic Image Segmentation. 2017. pdf
- K. He et al. Mask R-CNN arXiv preprint arXiv:1703.06870. 2017. pdf
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. 2018. pdf
- Learning to Segment Everything. 2018. pdf
Self-Supervised Learning:
- Unsupervised Visual Representation Learning by Context Prediction. 2015. pdf
- Colorful Image Colorization. 2016. pdf
- Representation Learning by Learning to Count. 2017. pdf
- Learning and Using the Arrow of Time. 2018. pdf
- Tracking Emerges by Colorizing Videos. 2018. pdf
- Audio-Visual Scene Analysis with Self-Supervised Multi-sensory Features. 2018. pdf
- Object Discovery with a Copy-Pasting GAN. 2019. pdf
- SimCLR: A Simple Framework for Contrastive Learning of Representations. 2020. pdf
Generative Adversarial Networks:
- Kingma, D, and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013). pdf
- Goodfellow, Ian, et al. Generative adversarial nets. 2014. pdf
- Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016). pdf
- Makzhani, Alireza, et al. Adversarial Autoencoders arXiv:1511.05644 (2015). pdf
- Gregor, Karol, et al. DRAW: A recurrent neural network for image generation. arXiv:1502.04623 (2015). pdf
Applications:
- Wasserstein GAN. 2017. pdf
- Large Scale GAN Training for High Fidelity Natural Image Synthesis. 2018. pdf
- A Style-based Generator Architecture for Generative Adversarial Networks 2018. pdf
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks 2017. pdf
- Conditional LSTM-GAN for Melody Generation from Lyrics. 2019. pdf
- GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction. 2019. pdf
Art:
- Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). Inceptionism: Going Deeper into Neural Networks. Google Research. html
- Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015). pdf
- CAN: Creative Adversarial Networks 2017. pdf
- Semantic Image Synthesis with Spatially-Adaptive Normalization 2019. pdf
- Deep Poetry: Word-Level and Char-Level Language Models for Shakespearean Sonnet Generation pdf
- BachProp: Learning to Compose Music in Multiple Styles 2018. pdf
- A 'New' Rembrandt: From the Frontiers of AI And Not The Artist's Atelier 2016. html
- Is artificial intelligence set to become art’s next medium? 2018. html
- AI Will Enhance - Not End - Human Art 2019. html
- An AI-Written Novella Almost Won a Literary Prize 2016. html
- How AI-Generated Music Is Changing The Way Hits Are Made 2018.html
- AI puts final notes on Beethoven's Tenth Symphony 2019. html
Previous Papers
- Zhu, Jun-Yan, et al. Generative Visual Manipulation on the Natural Image Manifold. European Conference on Computer Vision. Springer International Publishing, 2016. pdf
- Champandard, Alex J. Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks. arXiv preprint arXiv:1603.01768 (2016). pdf
- Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. arXiv preprint arXiv:1603.08155 (2016). pdf ️️️️
- Vincent Dumoulin, Jonathon Shlens and Manjunath Kudlur. A learned representation for artistic style. arXiv preprint arXiv:1610.07629 (2016). pdf ️️️️
- Gatys, Leon and Ecker, et al.Controlling Perceptual Factors in Neural Style Transfer. arXiv preprint arXiv:1611.07865 (2016). pdf
- Ulyanov, Dmitry and Lebedev, Vadim, et al. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. arXiv preprint arXiv:1603.03417(2016). pdf
- Bengio, Yoshua et. al. A Neural Probabilistic Model JMLR (2003). pdf
- Graves, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013). (LSTM, very nice generating result, show the power of RNN) pdf
- Mikolov, et al. Distributed representations of words and phrases and their compositionality. NIPS(2013) pdf
- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems. 2014.pdf
- Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. (2014). pdf
- Ashish Vaswani, et al. Attention is All you Need. NIPS (2017) pdf
- Matthew Peters, et al. Deep Contexualized Word Representations. pdf
- Jeremy Howard, et al. Universal Language Model Fine-Tuning for Text Classification ACL (2018) pdf
- 4. Jacob Devlin, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019) pdf
- 5. Victor Sanh, et al. DistilBERT, a distilled version of BERT. arXiv preprint arXiv:1910.01108(2019) pdf
- Lee, et al. Fully Character-Level Neural Machine Translation without Explicit Segmentation. (2016) pdf
- Wu, Schuster, Chen, Le, et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. pdf
- Jonas Gehring, et al. Convolutional Sequence to Sequence Learning. (2017). pdf
- Lample, et al. Phrase-Based & Neural Unsupervised Machine Translation. (2018) pdf
- Ye Jia, et al. Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model. (2019). pdf
- Wen, et al. Recurrent Neural Network Language Generation for Spoken Dialogue Systems. (2019) pdf
- Mrksic, et al. Multi-domain Dialog State Tracking using RNNs. (2015) pdf
- Srinivasan, et al. Natural Language Generation using Reinforcement Learning with External Rewards. (2019). pdf
- Zhu, et al. SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering. (2018) pdf
- Xiong, et al. Achieving Human Parity in Conversational Speech Recognition. arXiv:1610.05256 (2016). pdf
- Mnih, Volodymyr, et al. Playing atari with deep reinforcement learning. (2013). pdf
- Silver, David, et al. Mastering the game of Go with deep neural networks and tree search. (2016) pdf
- Silver, David, et al. Mastering the game of Go without Human Knowledge. (2017) pdf
- Silver, David, et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. (2017) pdf
- OpenAI. Learning Dexterous In-Hand Manipulation. pdf
Previous Papers
- Mnih, Volodymyr, et al. Human-level control through deep reinforcement learning. (2015) pdf
- Wang, Ziyu, Nando de Freitas, and Marc Lanctot. Dueling network architectures for deep reinforcement learning. (2015). pdf
- Mnih, Volodymyr, et al. Asynchronous methods for deep reinforcement learning. (2016). pdf
- Lillicrap, Timothy P., et al. Continuous control with deep reinforcement learning. (2015). pdf
- Gu, Shixiang, et al. Continuous Deep Q-Learning with Model-based Acceleration. (2016). pdf
- Schulman, John, et al. Trust region policy optimization. CoRR, abs/1502.05477 (2015). pdf
- Le, Quoc V. Building high-level features using large scale unsupervised learning. pdf
- Kingma, Diederik P., and Max Welling. Auto-encoding variational bayes. (2013). pdf
- Goodfellow, Ian, et al. Generative adversarial nets. Advances in Neural Information Processing Systems. 2014. pdf
- Radford, Alec, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. (2015). pdf
- Gregor, Karol, et al. DRAW: A recurrent neural network for image generation. (2015). pdf
- Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. (2016). pdf
- Oord, Aaron van den, et al. Conditional image generation with PixelCNN decoders. (2016). pdf
- Farhadi,Ali,etal. Every picture tells a story: Generating sentences from images. 2010. pdf ️️️
- Kulkarni, Girish, et al. Baby talk: Understanding and generating image descriptions. 2011. pdf️️️️
- Vinyals, Oriol, et al. Show and tell: A neural image caption generator. 2014. pdf️️️
- Donahue, Jeff, et al. Long-term recurrent convolutional networks for visual recognition and description. pdf
- Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. 2014. pdf️️️️️
- Karpathy, Andrej, Armand Joulin, and Fei Fei F. Li. Deep fragment embeddings for bidirectional image sentence mapping. 2014. pdf️️️️
- Fang, Hao, et al. From captions to visual concepts and back. 2014. pdf️️️️️
- Chen, Xinlei, and C. Lawrence Zitnick. Learning a recurrent visual representation for image caption generation. 2014. pdf️️️️
- Mao, Junhua, et al. Deep captioning with multimodal recurrent neural networks 2014. pdf️️️
- Xu, Kelvin, et al. Show, attend and tell: Neural image caption generation with visual attention. 2015. pdf️️️
- Hinton, Geoffrey, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. (2012) pdf
- Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. 2013 pdf
- Graves, Alex, and Navdeep Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks. 2014 pdf️️️
- Sak, Haşim, et al. Fast and accurate recurrent neural network acoustic models for speech recognition. (2015). pdf
- Amodei, Dario, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. (2015). pdf
- W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig Achieving Human Parity in Conversational Speech Recognition. (2016) pdf
- Hinton, Geoffrey E., et al. Improving neural networks by preventing co-adaptation of feature detectors. pdf
- Srivastava, Nitish, et al. Dropout: a simple way to prevent neural networks from overfitting. pdf
- Ioffe, Sergey, and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. pdf
- Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. pdf
- Courbariaux, Matthieu, et al. Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1. pdf
- Jaderberg, Max, et al. Decoupled neural interfaces using synthetic gradients.pdf
- Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. Net2net: Accelerating learning via knowledge transfer. pdf
- Wei, Tao, et al. Network Morphism. arXiv preprint arXiv:1603.01670 (2016). pdf
- Sutskever, Ilya, et al. On the importance of initialization and momentum in deep learning. pdf
- Kingma, Diederik, and Jimmy Ba. Adam: A method for stochastic optimization. pdf
- Andrychowicz, Marcin, et al. Learning to learn by gradient descent by gradient descent. pdf
- Han, Song, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. pdf
- Iandola, Forrest N., et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size. pdf ️️️️
[14.0] Koutník, Jan, et al. Evolving large-scale neural networks for vision-based reinforcement learning. Proceedings of the 15th annual conference on Genetic and evolutionary computation. ACM, 2013. [pdf] ️️️ [14.1] Levine, Sergey, et al. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17.39 (2016): 1-40. [pdf] ️️️️️ [14.2] Pinto, Lerrel, and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. arXiv preprint arXiv:1509.06825 (2015). [pdf] ️️️ [14.3] Levine, Sergey, et al. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection. arXiv preprint arXiv:1603.02199 (2016). [pdf] ️️️️ [14.4] Zhu, Yuke, et al. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. arXiv preprint arXiv:1609.05143 (2016). [pdf] ️️️️ [14.5] Yahya, Ali, et al. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search. arXiv preprint arXiv:1610.00673 (2016). [pdf] ️️️️ [14.6] Gu, Shixiang, et al. Deep Reinforcement Learning for Robotic Manipulation. arXiv preprint arXiv:1610.00633 (2016). [pdf] ️️️️ [14.7] A Rusu, M Vecerik, Thomas Rothörl, N Heess, R Pascanu, R Hadsell.Sim-to-Real Robot Learning from Pixels with Progressive Nets. arXiv preprint arXiv:1610.04286 (2016). [pdf] ️️️️ [14.8] Mirowski, Piotr, et al. Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016). [pdf]️️️️
[15.0] Bengio, Yoshua. Deep Learning of Representations for Unsupervised and Transfer Learning. ICML Unsupervised and Transfer Learning 27 (2012): 17-36. [pdf] **(**A Tutorial) ️️️ [15.1] Silver, Daniel L., Qiang Yang, and Lianghao Li. Lifelong Machine Learning Systems: Beyond Learning Algorithms. AAAI Spring Symposium: Lifelong Machine Learning. 2013. [pdf] **(**A brief discussion about lifelong learning) ️️️ [15.2] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015). [pdf] **(**Godfather's Work) ️️️️ [15.3] Rusu, Andrei A., et al. Policy distillation. arXiv preprint arXiv:1511.06295 (2015). [pdf] **(**RL domain) ️️️ [15.4] Parisotto, Emilio, Jimmy Lei Ba, and Ruslan Salakhutdinov. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015). [pdf] **(**RL domain) ️️️ [15.5] Rusu, Andrei A., et al. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016). [pdf] **(**Outstanding Work, A novel idea) ️️️️️
[16.0] Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science 350.6266 (2015): 1332-1338. [pdf] **(****No Deep Learning, but worth reading)**️️️️️ [16.1] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. Siamese Neural Networks for One-shot Image Recognition.(2015) [pdf] ️️️ [16.2] Santoro, Adam, et al. One-shot Learning with Memory-Augmented Neural Networks. arXiv preprint arXiv:1605.06065 (2016). [pdf] **(**A basic step to one shot learning) ️️️️ [16.3] Vinyals, Oriol, et al. Matching Networks for One Shot Learning. arXiv preprint arXiv:1606.04080 (2016). [pdf]️️️ [16.4] Hariharan, Bharath, and Ross Girshick. Low-shot visual object recognition. arXiv preprint arXiv:1606.02819 (2016). [pdf] **(**A step to large data) ️️️️
[17.0] Graves, Alex, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014). [pdf] (Basic Prototype of Future Computer) ️️️️️ [17.1] Zaremba, Wojciech, and Ilya Sutskever. Reinforcement learning neural Turing machines. arXiv preprint arXiv:1505.00521 362 (2015). [pdf] ️️️ [17.2] Weston, Jason, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916 (2014). [pdf]️️️ [17.3] Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. End-to-end memory networks. Advances in neural information processing systems. 2015. [pdf] ️️️️ [17.4] Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. Pointer networks. Advances in Neural Information Processing Systems. 2015. [pdf] ️️️️ [17.5] Graves, Alex, et al. Hybrid computing using a neural network with dynamic external memory. Nature (2016). [pdf] ️️️️️
credit Prof. Peter N Belhumeur