AI Chip Paper List

About This Project

This project aims to help engineers, researchers and students to easily find and learn the good thoughts and designs in AI-related fields, such as AI/ML/DL accelerators, chips, and systems, proposed in the top-tier architecture conferences (ISCA, MICRO, ASPLOS, HPCA).

This project is initiated by the Advanced Computer Architecture Lab (ACA Lab) in Shanghai Jiao Tong University in collaboration with Biren Research. Articles from additional sources is being added. Please let us know if you have any comments or willing to contribute.

The Listing of Tags

For guidance and searching purposes, Tags and/or notes are assigned to all these papers . We will use the following tags to annotate these papers.

The Chronological Listing of Papers

We list all AI related articles collected. The links of paper/slides/note are provided under the title of each article If available. Updating is in progress

ISCA

2020

Tags	Title	Authors	Affiliations
Inference; SIMD	High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs paper note	Glenn Henry; Parviz Palangpour	Centaur Technology
Inference; dataflow	Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workload paper note	Dennis Abts; Jonathan Ross	Groq Inc.
Spiking; dataflow; Sparsity	SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks paper note	Surya Narayanan; Karl Taht	University of Utah
Inference; benchmarking	MLPerf Inference Benchmark paper note	Vijay Janapa Reddi; Lingjie Xu, etc.
GPU; Compression	Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs paper note	Esha Choukse; Michael Sullivan	University of Texas at Austin; NVIDIA
Inference; runtime	A Multi-Neural Network Acceleration Architecture paper note	Eunjin Baek; Dongup Kwon; Jangwoo Kim	Seoul National University
Inference; Dynamic fixed-point	DRQ: Dynamic Region-Based Quantization for Deep Neural Network Acceleration paper note	Zhuoran Song; Naifeng Jing; Xiaoyao Liang	Shanghai Jiao Tong University
Training; LSTM; GPU	Echo: Compiler-Based GPU Memory Footprint Reduction for LSTM RNN Training paper note	Bojian Zheng; Nandita Vijaykumar	University of Toronto
Inference	DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation paper note	Udit Gupta; Samuel Hsia; Vikram Saraph	Harvard University; Facebook Inc

2019

Tags	Title	Authors	Affiliations
Inference, Dataflow	3D-based Video Recognition Acceleration by Leveraging Temporal Locality paper note	Huixiang Chen; Tao Li	University of Florida
Inference; Quantumn	A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron Superconducting Technology paper note	Ruizhe Cai; Ao Ren; Nobuyuki Yoshikawa; Yanzhi Wang	Northeastern University
Training; Reinforcement Learning; Distributed training	Accelerating Distributed Reinforcement Learning with In-Switch Computing paper note	Youjie Li; Jian Huang	UIUC
Training; Sparsity	Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks paper note	Jiaqi Zhang; Tao Li	University of Florida
Inference; Sparsity; Bit-serial	Laconic Deep Learning Inference Acceleration paper note	Sayeh Sharify; Andreas Moshovos	University of Toronto
Inference; Memory; bandwidth-saving; large-scale networks; compression	MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks paper note	Hanhwi Jang; Jangwoo Kim	POSTECH; Seoul National University
Inference; ReRAM; Sparsity	Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks paper note	Tzu-Hsien Yang	National Taiwan University; Academia Sinica; Macronix International.
Infernce; Redundant computing	TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network paper note	Chunhua Deng; Bo Yuan	Rutgers University
Training; CNN; floating point	FloatPIM_ in-memory acceleration of deep neural network training with high precision paper note	Mohsen Imani; Tajana Rosing	UC San Diego
Training; Programming model	Cambricon-F_ machine learning computers with fractal von neumann architecture paper note	Yongwei Zhao; Yunji Chen	ICT; Cambricon

2018

Tags	Title	Authors	Affiliations
Training;CNN; RNN	A Configurable Cloud-Scale DNN Processor for Real-Time AI paper note	Jeremy Fowers; Doug Burger	Microsoft
Inference; ReRAM	PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine- Learning Algorithms paper note	Prakalp Srivastava; Mingu Kang	University of Illinois at Urbana-Champaign; IBM
Inference; Dataflow	Computation Reuse in DNNs by Exploiting Input Similarity paper slides note	Marc Riera; Antonio Gonza ?lez	Universitat Polite ?cnica de Catalunya
Spiking	Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations paper note slides	Dayeol Lee; Jangwoo Kim	Seoul National University; University of California
Space-time computing	Space-Time Algebra: A Model for Neocortical Computation paper slides note	James E. Smith	University of Wisconsin-Madison
Inference; Cross-module optimization	RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM paper note	Fengbin Tu; Shaojun Wei	Tsinghua University
Inference;Datapath: bit-serial	Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks paper note	Charles Eckert; Reetuparna Das	University of Michigan; Intel Corporation
Inference;Cross-module optimization	EVA2: Exploiting Temporal Redundancy in Live Computer Vision paper note slides	Mark Buckler; Adrian Sampson	Cornell University
Inference;CNN; Cross-module optimization; Power optimization	Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision paper slides note	Yuhao Zhu; Paul Whatmough	University of Rochetster; ARM Research
Inference;GAN; Sparsity; MIMD; SIMD	GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks paper note	Amir Yazdanbakhsh; Hadi Esmaeilzadeh	Georgia Institute of Technology; UC San Diego; Qualcomm Technologies
Inference; CNN; Approximate	SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks paper note	Vahideh Akhlaghi; Hadi Esmaeilzadeh	Georgia Institute of Technology; UC San Diego; Qualcomm .
Inference;CNN; Sparsity;	UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition paper note	Kartik Hegde; Christopher W. Fletche	University of Illinois at Urbana-Champaign; NVIDIA
Inference; Non-uniform	Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation paper note	Eunhyeok Park; Sungjoo Yoo	Seoul National University
Inference; Dataflow: Dynamic	Prediction Based Execution on Deep Neural Networks paper note	Mingcong Song; Tao Li	University of Flirida
Inference; Datapath: bit-serial	Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network paper note	Hardik Sharma; Hadi Esmaeilzadeh	Georgia Institute of Technology; University of California
Training; memory: bandwith-saving	Gist: Efficient Data Encoding for Deep Neural Network Training paper note	Animesh Jain; Gennady Pekhimenko	Microsoft Research; University of Toronto; Univerity of Michigan
Inference; Cross-module optimization	The Dark Side of DNN Pruning paper note	Reza Yazdani; Antonio Gonza ?lez	Universitat Polite ?cnica de Catalunya

2017

Tags	Title	Authors	Affiliations
Inference	In-Datacenter Performance Analysis of a Tensor Processing Unit paper note	Norman P. Jouppi	Google
Inference; Dataflow	Maximizing CNN Accelerator Efficiency Through Resource Partitioning paper note	Yongming Shen	Stony Brook University
Training	SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks paper note	Swagath Venkataramani; Anand Raghunathan	Purdue University; Parallel Computing Lab; Intel Corporation
Inference; Algorithm-architecture-codesign	Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism paper note	Jiecao Yu; Scott Mahlke	University of Michigan; ARM
Inference; Sparsity	SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks paper note	Angshuman Parashar; William J. Dally	NVIDIA; MIT; UC-Berkeley; Stanford University
Training; Low-bit	Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent paper note	Christopher De Sa; Kunle Olukotun	Stanford University

2016

Tags	Title	Authors	Affiliations
Inference;Sparsity	Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing paper note	Jorge Albericio; Tayler Hetheringto	University of Toronto; University of British Columbia
Inference; Analog	ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars paper note	Ali Shafiee; Vivek Srikumar	University of Utah，Hewlett Packard Labs
Inference; PIM	PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory paper note	Ping Chi; Yuan Xie	University of California
Inference; Sparsity	EIE: Efficient Inference Engine on Compressed Deep Neural Network paper note	Song Han; William J. Dally	Stanford University; NVIDIA
Inference; Analog	RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile paper note	Robert LiKamWa; Lin Zhong	Rice University
Inference; Architecture-Physical-Co-design	Minerva: Enabling Low-Power; Highly-Accurate Deep Neural Network Accelerators paper note	Brandon Reagen; David Brooks	Harvard University
Inference; Dataflow	Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks paper note	Yu-Hsin Chen; Vivienne Sze	MIT; NVIDIA
Inference; 3D integration	Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory paper note	Duckhwan Kim; Saibal Mukhopadhyay	Georgia Institute of Technology
Inference	Cambricon: An Instruction Set Architecture for Neural Networks paper note	Shaoli Liu; Tianshi Chen	CAS; Cambricon Ltd.

2015

Tags	-	Title	Authors	Affiliations
Inference; Cross-module optimization		ShiDianNao: Shifting Vision Processing Closer to the Sensor paper note	Zidong Du	ICT

ASPLOS

2020

Tags	Title	Authors	Affiliations
Inference; Security	Shredder: Learning Noise Distributions to Protect Inference Privacy paper note	Fatemehsadat Mireshghallah; Mohammadkazem Taram; et.al.	UCSD
Algorithm-Architecture co-design; Security	DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks paper note	Xingbin Wang; Rui Hou; Boyan Zhao; et.al.	CAS; USC
programming model; Algorithm-Architecture co-design	Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators paper note	Xuan Yang; Mark Horowitz; et.al.	Stanford; THU
Algorithm-Architecture co-design; security	DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints paper note codes	Xing Hu; Yuan Xie; et.al.	UCSB
Training; distributed computing	Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training paper note	Qinyi Luo; Jiaao He; Youwei Zhuo; Xuehai Qian	USC
compression	PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning paper	Wei Niu; Xiaolong Ma; Sheng Lin; et.al.	College of William and Mary; Northeastern ; USC
Power optimization; compute-memory trade-off	Capuchin: Tensor-based GPU Memory Management for Deep Learning paper note	Xuan Peng; Xuanhua Shi; Hulin Dai; et.al.	HUST; MSRA; USC
Compute-memory trade-off	NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units paper	Bongjoon Hyun; Youngeun Kwon; Yujeong Choi; et.al.	KAIST
Algorithm-Architecture co-design	FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System paper note codes	Size Zheng; Yun Liang; Shuo Wang; et.al.	PKU

2019

Tags	Title	Authors	Affiliations
Inference, ReRAM	PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference paper note	Aayush Ankit; Dejan S Milojičić; et.al.	Purdue; UIUC; HP
Reinforcement Learning	FA3C: FPGA-Accelerated Deep Reinforcement Learning paper note	Hyungmin Cho; Pyeongseok Oh; Jiyoung Park; et.al.	Hongik University; SNU
Inference, ReRAM	FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture paper note	Yu Ji; Yuan Xie; et.al.	THU; UCSB
Inference, Bit-serial	Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks paper note	Alberto Delmas Lascorz; Andreas Ioannis Moshovos; et.al.	Toronto; NVIDIA
Inference, Dataflow	TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators paper note codes	Mingyu Gao; Xuan Yang; Jing Pu; et.al.	Stanford
Inference, CNN, Systolic, Sparsity	Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization paper codes note	Hsiangtsung Kung;Bradley McDanel; Saiqian Zhang	Harvard
Training, CNN, Distributed computing	Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization paper note	Tian Jin; Seokin Hong	IBM; Kyungpook National University
Training, Distributed computing	HOP: Heterogeneity-Aware Decentralized Training paper note	Qinyi Luo; Jinkun Lin; Youwei Zhuo; Xuehai Qian	USC; THU
Training, Compiler	Astra: Exploiting Predictability to Optimize Deep Learning paper note	Muthian Sivathanu; Tapan Chugh; Sanjay S Singapuram; Lidong Zhou	Microsoft
Training, Quantization, Compression	ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers paper note	Ao Ren; Tianyun Zhang; Shaokai Ye; et.al.	Northeastern; Syracuse; SUNY; Buffalo; USC
Security	DeepSigns: An End-to-End Watermarking Framework for Protecting the Ownership of Deep Neural Networks paper note	Bita Darvish Rouhani; Huili Chen; Farinaz Koushanfar	UCSD

2018

Tags	Title	Authors	Affiliations
Compiler	Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler paper slides note	Yu Ji; Youhui Zhang; Wenguang Chen; Yuan Xie	Tsinghua; UCSB
Inference, Dataflow, NoC	MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects paper note slides	Hyoukjun Kwon; Ananda Samajdar; Tushar Krishna	Georgia Tech
Bayesian	VIBNN: Hardware Acceleration of Bayesian Neural Networks paper note	Ruizhe Cai; Ao Ren; Ning Liu; et.al.	Syracuse University; USC

2017

Tags	-	Title	Authors	Affiliations
Dataflow, 3D Integration		Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory paper note	Mingyu Gao; Jing Pu; Xuan Yang	Stanford University
CNN; Algorithm-Architecture co-design		SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing paper note	Ao Ren; Zhe Li; Caiwen Ding	Syracuse University; USC; The City College of New York

2015

Tags	-	Title	Authors	Affiliations
Inference		In-Datacenter Performance Analysis of a Tensor Processing Unit paper note	Daofu Liu; Tianshi Chen; Shaoli Liu	CAS; USTC; Inria

2014

Tags	-	Title	Authors	Affiliations
Inference		DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning paper note	Tianshi Chen; Zidong Du; Ninghui Sun	CAS; Inria

MICRO

2020

Tags	Title	Authors	Affiliations
PIM/CIM; systolic	Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration paper note	Akshay Krishna Ramanathan1	The Pennsylvania State University ; Intel
PIM; cache; reconfigurable	FReaC Cache: Folded-Logic Reconfigurable Computing in the Last Level Cache paper note	Ashutosh Dhar	University of Illinois; Urbana-Champaign; †IBM Research;
Bayesian; sparsity	Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks paper note	Qiyu Wan	ECOMS Lab; University of Houston
low-bit	Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks paper note	Gil Shomron; Uri Weiser	Faculty of Electrical Engineering; Technion — Israel Institute of Technology
compiler	ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning paper note	Sheng-Chun Kao; Geonhwa Jeong; Tushar Krishna	Georgia Institute of Technology
algorithm-architecture co-design; cross-module optimization	VR-DANN: Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration paper note	Zhuoran Song; Feiyang Wu; Xueyuan Liu1	Shanghai Jiao Tong University; Biren Research
PIM/CIM	Newton: A DRAM-Maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning paper note	Mingxuan He	Purdue University
	Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks paper note	Soroush Ghodrati ;Byung Hoon Ahn ;Joon Kyung Kim	Bigstream Inc. ;Kansas University;University of Illinois Urbana-Champaign;NVIDIA Research;Google Inc.
training; sparsity	Procrustes: A Dataflow and Accelerator for Sparse Deep Neural Network Training paper note	Dingqing Yang; Amin Ghasemazar; Xiaowei Ren	The University of British Columbia; Microsoft Corporation
GPU; tensor core; compiler; bandwidth saving	Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores paper note	Hyeonjin Kim; Sungwoo Ahn; Yunho Oh	Yonsei University; EcoCloud
algorithm-architecture co-design; compute-memory tradeoff	DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture paper note	Liu Liu	UC Santa Barbara
inference; compression	TFE: Energy-Efficient Transferred Filter-Based Engine to Compress and Accelerate Convolutional Neural Networks paper note	Huiyu Mo; Leibo Liu; Wenjing Hu	Tsinghua University;Intel
training; sparsity	TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training paper note	Mostafa Mahmoud; Isak Edo; Ali Hadi Zadeh	University of Toronto;Cerebras Systems;Vector Institute
training; inference; sparsity; CPU	SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs paper note	Zhangxiaowen Gong; Houxiang Ji	University of Illinois at Urbana-Champaign; Intel
NLP; sparsity; bandwidth saving	GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference paper note	Ali Hadi Zadeh; Isak Edo; Omar Mohamed Awad	University of Toronto
training; cross-module optimization	TrainBox: An Extreme-Scale Neural Network Training Server Architecture by Systematically Balancing Operations paper note	Pyeongsu Park; Heetaek Jeong; Jangwoo Kim	Seoul National University

2019

Tags	Title	Authors	Affiliations
compute-memory trade-off; Dataflow	Wire-Aware Architecture and Dataflow for CNN Accelerators paper note	Sumanth Gudaparthi; Surya Narayanan; Rajeev Balasubramonian ; Edouard Giacomin ; Hari Kambalasubramanyam; Pierre-Emmanuel Gaillardon	Utah
security; compute-memory trade-off	ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning paper note	Shang-Tse Chen； Cory Cornelius； Jason Martin； Duen Horng Chau	Georgia tech; intel
Inference; NoC; Cross-Module optimization	Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture paper note slides	Yakun Sophia Shao;Jason Clemons; Rangharajan Venkatesan; et. al.	NVIDIA
compression; ISA; Cross-Module optimization	ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions paper note	Berkin Akin; Zeshan A. Chishti; Alaa R. Alameldeen	Google; Intel
Algorithm-Architecture co-design	Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating paper note	Weizhe Hua; Yuan Zhou; Christopher De Sa; et.al.	Cornell
Sparsity	SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks paper note	Ashish Gondimalla; Noah Chesnu; Noah Chesnu; et.al.	Purdue
Power-optimization; Approximate;	EDEN: Enabling Approximate DRAM for DNN Inference using Error-Resilient Neural Networks paper note	Skanda Koppula; Lois Orosa; A. Giray Yağlıkçı; et.al.	ETHZ
inference; CNN	eCNN: a Block-Based and Highly-Parallel CNN Accelerator for Edge Inference paper note	Chao-Tsung Huang; Yu-Chun Ding;Huan-Ching Wang; et. al.	NTHU
Architecture-Physical co-design	TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning paper note	Youngeun Kwon; Yunjae Lee; Minsoo Rhu	KAIST
Architecture-Physical co-design; dataflow	Understanding Reuse; Performance; and Hardware Cost of DNN Dataflows: A Data-Centric Approach paper note	Hyoukjun Kwon; Prasanth Chatarasi; Michael Pellauer; et.al.	Georgia Tech; NVIDIA
sparsity; inference;	MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation paper note	Lillian Pentecost, Marco Donato, Brandon Reagen; et.al.	Harvard; Facebook
RNN; Special operation;	Neuron-Level Fuzzy Memoization in RNNs paper note	Franyell Silfa;Gem Dot; Jose-Maria Arnau; et.al.	UPC
inference; Algorithm-Architecture co-design;	Manna: An Accelerator for Memory-Augmented Neural Networks paper note	Jacob R. Stevens; Ashish Ranjan; Dipankar Das; et.al.	Purdue; Intel
PIM	eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing paper note	Elaheh Sadredini; Reza Rahimi; Vaibhav Verma;et.al.	Virginia
Sparsity	ExTensor: An Accelerator for Sparse Tensor Algebra paper note	Kartik Hegde; Hadi Asghari-Moghaddam; Michael Pellauer	UIUC; NVIDIA
Sparsity; Algorithm-Architecture co-design	Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization paper note	Fazle Sadi; Joe Sweeney; Tze Meng Low; et.al.	CMU
sparsity; Algorithm-Architecture co-design; compression	Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs paper note	Maohua Zhu; Tao Zhang; Tao Zhang; Yuan Xie	UCSB; Alibaba
special operation; inference	ASV: Accelerated Stereo Vision System paper note codes1 codes2	Yu Feng; Paul Whatmough; Yuhao Zhu	Rochester
Algorithm-Architecture co-design; special operation	Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach paper note	Mingyu Yan;Xing Hu; Shuangchen Li; et.al.	UCSB; ICT

2018

Tags	Title	Authors	Affiliations
Sparsity	Cambricon-s: Addressing Irregularity in Sparse Neural Networks: A Cooperative Software/Hardware Approach paper note	Xuda Zhou ; Zidong Du ; Qi Guo ; Shaoli Liu ; Chengsi Liu ; Chao Wang ; Xuehai Zhou ; Ling Li ; Tianshi Chen ; Yunji Chen	USTC; CAS
Inference; CNN; spatial correlation	Diffy: a Deja vu-Free Differential Deep Neural Network Accelerator paper note	Mostafa Mahmoud ; Kevin Siu ; Andreas Moshovos	University of Toronto
Distributed computing	Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning paper note	Youngeun Kwon; Minsoo Rhu	KAIST
RNN	Towards Memory Friendly Long-Short Term Memory Networks(LSTMs) on Mobile GPUs paper note	Xingyao Zhang; Chenhao Xie; Jing Wang; et.al.	University of Houston; Capital Normal University
Training, distributed computing, compression	A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks paper note	Youjie Li; Jongse Park; Mohammad Alian; et.al.	UIUC; THU; SJTU; Intel; UCSD
Inference, sparsity, compression	PermDNN: Efficient Compressed Deep Neural Network Architecture with Permuted Diagonal Matrices paper note	Chunhua Deng; Siyu Liao; Yi Xie; et.al.	City University of New York; University of Minnesota; USC
Reinforcement Learning, algorithm-architecture co-design	GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware paper note	Ananda Samajdar; Parth Mannan; Kartikay Garg; Tushar Krishna	Georgia Tech
Training, PIM	Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach paper note	Jiawen Liu; Hengyu Zhao; et.al.	UCM; UCSD; UCSC
GAN, PIM	LerGAN: A Zero-free; Low Data Movement and PIM-based GAN Architecture paper note	Haiyu Mao; Mingcong Song; Tao Li; et.al.	THU; University of Florida
Training, special operation, dataflow	Multi-dimensional Parallel Training of Winograd Layer on Memory-centric Architecture paper note	Byungchul Hong; Yeonju Ro; John Kim	KAIST
PIM/CIM	SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator paper note	Shuangchen Li; Alvin Oliver Glova; Xing Hu; et.al.	UCSB; Samsung
Inference, algorithm-architecture co-design	Morph: Flexible Acceleration for 3D CNN-based Video Understanding paper note	Kartik Hegde; Rohit Agrawal; Yulun Yao; Christopher W Fletcher	UIUC

2017

Tags	Title	Authors	Affiliations
Bit-serial	Bit-Pragmatic Deep Neural Network Computing paper note	Jorge Albericio; Alberto Delmás; Patrick Judd; et.al.	NVIDIA; University of Toronto
CNN, Special computing	CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices paper note	Caiwen Ding; Siyu Liao; Yanzhi Wang; et.al.	Syracuse University; City University of New York; USC; California State University; Northeastern University
PIM	DRISA: A DRAM-based Reconfigurable In-Situ Accelerator paper note	Shuangchen Li; Dimin Niu; et.al.	UCSB; Samsung
Distributed computing	Scale-Out Acceleration for Machine Learning paper note	Jongse Park; Hardik Sharma; Divya Mahajan; et.al.	Georgia Tech; UCSD
DNN, Sparsity, Bandwidth saving	DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission paper note	Parker Hill; Animesh Jain; Mason Hill; et.al.	Univ. of Michigan; Univ. of Nevada

2016

Tags	Title	Authors	Affiliations
DNN, compiler, Dataflow	From High-Level Deep Neural Models to FPGAs paper note	Hardik Sharma; Jongse Park; Divya Mahajan; et.al.	Georgia Institute of Technology; Intel
DNN, Runtime, training	vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design paper note	Minsoo Rhu; Natalia Gimelshei; Jason Clemons; et.al.	NVIDIA
Bit-serial	Stripes: Bit-Serial Deep Neural Network Computing paper note	Patrick Judd; Jorge Albericio; Tayler Hetherington; et.al.	University of Toronto; University of British Columbia
Sparsity	Cambricon-X: An Accelerator for Sparse Neural Networks paper note	Shijin Zhang; Zidong Du; Lei Zhang; et.al.	Chinese Academy of Sciences
Neuromorphic, Spiking, programming model	NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints paper note	Yu Ji; YouHui Zhang; ShuangChen Li; et.al.	Tsinghua University; UCSB
Cross Module optimization	Fused-Layer CNN Accelerators paper note	Manoj Alwani; Han Chen; Michael Ferdman; Peter Milder	Stony Brook University
power optimization, cross module optimization	A Patch Memory System For Image Processing and Computer Vision paper note	Jason Clemons; Chih-Chi Cheng; Iuri Frosio; Daniel Johnson; Stephen W. Keckler	NVIDIA; Qualcomm
power optimization	An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition paper note	Reza Yazdani; Albert Segura; Jose-Maria Arnau; Antonio Gonzalez	Universitat Politecnica de Catalunya

2014

Tags	-	Title	Authors	Affiliations
Inference, CNN		DaDianNao: A Machine-Learning Supercomputer paper note	Yunji Chen; Tao Luo; Shaoli Liu; et.al.	CAS; Inria; Inner Mongolia University

HPCA

2020

Tags	Title	Authors	Affiliations
ReRam	Deep Learning Acceleration with Neuron-to-Memory Transformation Paper note	Mohsen Imani; Mohammad Samragh Razlighi; Yeseong Kim; et.al.	UCSD
graph network	HyGCN: A GCN Accelerator with Hybrid Architecture Paper note	Mingyu Yan; Lei Deng; Xing Hu; et.al.	ICT; UCSB
training; sparsity	SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training Paper note Slides	Eric Qin; Ananda Samajdar; Hyoukjun Kwon; et.al.	Georgia Tech
Programming model; DNN	PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible NPUs Paper note	Yujeong Choi; Minsoo Rhu	KAIST
sparsity; compute-memory trade-off	ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator Paper note	Bahar Asgari; Ramyad Hadidi; Tushar Krishna; et.al.	Georgia Tech
sparsity;Algorithm-Architecture co-design	SpArch: Efficient Architecture for Sparse Matrix Multiplication Paper note Project	Zhekai Zhang; Hanrui Wan; Song Han ; William J. Dally	MIT; NVIDIA
Algorithm-Architecture co-design; Approximation	A3: Accelerating Attention Mechanisms in Neural Networks with Approximation Paper note	Tae Jun Ham; Sung Jun Jung; Seonghak Kim; et.al.	SNU
training; Architecture-Physical co-design	AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays Paper note	Linghao Song; Fan Chen; Youwei Zhuo; et.al.	Duke; USC
Special operation, architecture-physical co-design	PIXEL: Photonic Neural Network Accelerator Paper note	Kyle Shiflett; Dylan Wright; Avinash Karanth; Ahmed Louri	Ohio; George Washington
Capasule; PIM	Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design Paper note	Xingyao Zhang; Shuaiwen Leon Song; Chenhao Xie; et.al.	Houston
Bandwidth saving	Communication Lower Bound in Convolution Accelerators Paper note	Xiaoming Chen; Yinhe Han; Yu Wang	ICT; THU
Training, Distributed computing; algorithm-architecture co-design	EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform Paper note	Jianbo Dong; Zheng Cao; Tao Zhang; et.al.	Alibaba
NoC;	Experiences with ML-Driven Design: A NoC Case Study Paper note	Jieming Yin; Subhash Sethumurugan; Yasuko Eckert; et.al.	AMD
sparsity	Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations Paper note	Nitish Srivastava; Hanchen Jin; Shaden Smith; et.al.	Cornell; Intel
algorithm-architecture co-design	A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms Paper note	Jian Weng; Sihao Liu; Zhengrong Wang; et.al.	UCLA
Reinforcement Learning; NoC; algorithm-architecture co-design	A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study Paper note	Ting-Ru Lin; Drew Penney; Massoud Pedram; Lizhong Chen	USC; OSU
power optimization	Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices Paper note	Jawad Haj-Yahya; Yanos Sazeides; Mohammed Alser; et.al.	ETHZ; Cyprus; CMU

2019

Tags	Title	Authors	Affiliations
training; compute-memory trade-off	HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array paper note	Linghao Song; Jiachen Mao; Yiran Chen; et.al.	Duke; USC
RNN; algorithm-architecture co-design	E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs paper note	Zhe Li; Caiwen Ding; Siyue Wang	Syracuse University; Northeastern University; Florida International University; USC; University at Buffalo
CNN, Bit-serial, Sparsity	Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks paper note	Xiaowei Wang; Jiecao Yu; Charles Augustine; et.al.	Michigan; Intel
cross-Module optimization	Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators paper note	Arash Azizimazreah; Lizhong Chen	OSU
PIM/CIM, low-bit, binary	NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks paper note	Hyeonuk Kim; Jaehyeong Sim; Yeongjae Choi; Lee-Sup Kim	KAIST
Accuracy-Latency trade-off	Kelp: QoS for Accelerators in Machine Learning Platforms paper note	Haishan Zhu; David Lo; Liqun Cheng	Microsoft; Google; UT Austin
inference	Machine Learning at Facebook: Understanding Inference at the Edge paper note	Carole-Jean Wu; David Brooks; Kevin Chen; et.al.	Facebook
Architecture-Physical co-design	The Accelerator Wall: Limits of Chip Specialization paper note codes	Adi Fuchs; David Wentzlaff	Princeton

2018

Tags	Title	Authors	Affiliations
special operation; approximate	Making Memristive Neural Network Accelerators Reliable paper note	Ben Feinberg; Shibo Wang; Engin Ipek	University of Rochester
Algorithm-Architecture co-design; GAN	Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning paper note	Mingcong Song; Jiaqi Zhang; Huixiang Chen; Tao Li	University of Florida
compression; sparsity	Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks paper note	Minsoo Rhu; Mike O'Connor; Niladrish Chatterjee; et.al.	POSTECH; NVIDIA; UT-Austin
architecture-psychical co-design; inference	In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems paper note	Mingcong Song; Kan Zhong; Tao li; et.a.	University of Florida; Chongqing University; Capital Normal University
Special operation; ReRam	GraphR: Accelerating Graph Processing Using ReRAM paper note	Linghao Song; Youwei Zhuo; Xuehai Qian	Duke; USC;
pim; Special operation; datafow	GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition paper note	Mingxing Zhang; Youwei Zhuo; Chao Wang; et.al.	THU; USC; Stanford
Power optimization; PIM	PM3: Power Modeling and Power Management for Processing-in-Memory paper note	Chao Zhang; Tong Meng; Guangyu Sun	PKU

2017

Tags	Title	Authors	Affiliations
Inference, CNN, Dataflow	FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks paper note	Wenyan Lu; Guihai Yan; Jiajun Li; et.al.	Chinese Academy of Sciences
Inference, ReRAM	PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning paper note	Linghao Song; Xuehai Qian; Hai Li; Yiran Chen	University of Pittsburgh; University of Southern California
Training	Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures paper note	Mingcong Song; Yang Hu; Huixiang Chen; Tao Li	University of Florida

2016

Tags	-	Title	Authors	Affiliations
Programming model, training		TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning paper note	Divya Mahajan; Jongse Park; Emmanuel Amaro	Georgia Institute of Technology
ReRam; Boltzmann		Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning paper note	Mahdi Nazm Bojnordi; Engin Ipek	University of Rochester

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AI Chip Paper List

Table of Contents

About This Project

The Listing of Tags

The Chronological Listing of Papers

ISCA

2020

2019

2018

2017

2016

2015

ASPLOS

2020

2019

2018

2017

2015

2014

MICRO

2020

2019

2018

2017

2016

2014

HPCA

2020

2019

2018

2017

2016

Files

README.md

Latest commit

History

README.md

File metadata and controls

AI Chip Paper List

Table of Contents

About This Project

The Listing of Tags

The Chronological Listing of Papers

ISCA

2020

2019

2018

2017

2016

2015

ASPLOS

2020

2019

2018

2017

2015

2014

MICRO

2020

2019

2018

2017

2016

2014

HPCA

2020

2019

2018

2017

2016