This project aims to help engineers, researchers and students to easily find and learn the good thoughts and designs in AI-related fields, such as AI/ML/DL accelerators, chips, and systems, proposed in the top-tier architecture conferences (ISCA, MICRO, ASPLOS, HPCA).
This project is initiated by the Advanced Computer Architecture Lab (ACA Lab) in Shanghai Jiao Tong University in collaboration with Biren Research. Articles from additional sources is being added. Please let us know if you have any comments or willing to contribute.
For guidance and searching purposes, Tags and/or notes are assigned to all these papers . We will use the following tags to annotate these papers.
We list all AI related articles collected. The links of paper/slides/note are provided under the title of each article If available. Updating is in progress
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference; SIMD | High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs paper note |
Glenn Henry; Parviz Palangpour | Centaur Technology | |
Inference; dataflow | Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workload paper note |
Dennis Abts; Jonathan Ross | Groq Inc. | |
Spiking; dataflow; Sparsity | SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks paper note |
Surya Narayanan; Karl Taht | University of Utah | |
Inference; benchmarking | MLPerf Inference Benchmark paper note |
Vijay Janapa Reddi; Lingjie Xu, etc. | ||
GPU; Compression | Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs paper note |
Esha Choukse; Michael Sullivan | University of Texas at Austin; NVIDIA | |
Inference; runtime | A Multi-Neural Network Acceleration Architecture paper note |
Eunjin Baek; Dongup Kwon; Jangwoo Kim | Seoul National University | |
Inference; Dynamic fixed-point | DRQ: Dynamic Region-Based Quantization for Deep Neural Network Acceleration paper note |
Zhuoran Song; Naifeng Jing; Xiaoyao Liang | Shanghai Jiao Tong University | |
Training; LSTM; GPU | Echo: Compiler-Based GPU Memory Footprint Reduction for LSTM RNN Training paper note |
Bojian Zheng; Nandita Vijaykumar | University of Toronto | |
Inference | DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation paper note |
Udit Gupta; Samuel Hsia; Vikram Saraph | Harvard University; Facebook Inc |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference, Dataflow | 3D-based Video Recognition Acceleration by Leveraging Temporal Locality paper note |
Huixiang Chen; Tao Li | University of Florida | |
Inference; Quantumn | A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron Superconducting Technology paper note |
Ruizhe Cai; Ao Ren; Nobuyuki Yoshikawa; Yanzhi Wang | Northeastern University | |
Training; Reinforcement Learning; Distributed training | Accelerating Distributed Reinforcement Learning with In-Switch Computing paper note |
Youjie Li; Jian Huang | UIUC | |
Training; Sparsity | Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks paper note |
Jiaqi Zhang; Tao Li | University of Florida | |
Inference; Sparsity; Bit-serial | Laconic Deep Learning Inference Acceleration paper note |
Sayeh Sharify; Andreas Moshovos | University of Toronto | |
Inference; Memory; bandwidth-saving; large-scale networks; compression | MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks paper note |
Hanhwi Jang; Jangwoo Kim | POSTECH; Seoul National University | |
Inference; ReRAM; Sparsity | Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks paper note |
Tzu-Hsien Yang | National Taiwan University; Academia Sinica; Macronix International. | |
Infernce; Redundant computing | TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network paper note |
Chunhua Deng; Bo Yuan | Rutgers University | |
Training; CNN; floating point | FloatPIM_ in-memory acceleration of deep neural network training with high precision paper note |
Mohsen Imani; Tajana Rosing | UC San Diego | |
Training; Programming model | Cambricon-F_ machine learning computers with fractal von neumann architecture paper note |
Yongwei Zhao; Yunji Chen | ICT; Cambricon |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Training;CNN; RNN | A Configurable Cloud-Scale DNN Processor for Real-Time AI paper note |
Jeremy Fowers; Doug Burger | Microsoft | |
Inference; ReRAM | PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine- Learning Algorithms paper note |
Prakalp Srivastava; Mingu Kang | University of Illinois at Urbana-Champaign; IBM | |
Inference; Dataflow | Computation Reuse in DNNs by Exploiting Input Similarity paper slides note |
Marc Riera; Antonio Gonza ?lez | Universitat Polite ?cnica de Catalunya | |
Spiking | Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations paper note slides |
Dayeol Lee; Jangwoo Kim | Seoul National University; University of California | |
Space-time computing | Space-Time Algebra: A Model for Neocortical Computation paper slides note |
James E. Smith | University of Wisconsin-Madison | |
Inference; Cross-module optimization | RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM paper note |
Fengbin Tu; Shaojun Wei | Tsinghua University | |
Inference;Datapath: bit-serial | Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks paper note |
Charles Eckert; Reetuparna Das | University of Michigan; Intel Corporation | |
Inference;Cross-module optimization | EVA2: Exploiting Temporal Redundancy in Live Computer Vision paper note slides |
Mark Buckler; Adrian Sampson | Cornell University | |
Inference;CNN; Cross-module optimization; Power optimization | Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision paper slides note |
Yuhao Zhu; Paul Whatmough | University of Rochetster; ARM Research | |
Inference;GAN; Sparsity; MIMD; SIMD | GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks paper note |
Amir Yazdanbakhsh; Hadi Esmaeilzadeh | Georgia Institute of Technology; UC San Diego; Qualcomm Technologies | |
Inference; CNN; Approximate | SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks paper note |
Vahideh Akhlaghi; Hadi Esmaeilzadeh | Georgia Institute of Technology; UC San Diego; Qualcomm . | |
Inference;CNN; Sparsity; | UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition paper note |
Kartik Hegde; Christopher W. Fletche | University of Illinois at Urbana-Champaign; NVIDIA | |
Inference; Non-uniform | Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation paper note |
Eunhyeok Park; Sungjoo Yoo | Seoul National University | |
Inference; Dataflow: Dynamic | Prediction Based Execution on Deep Neural Networks paper note |
Mingcong Song; Tao Li | University of Flirida | |
Inference; Datapath: bit-serial | Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network paper note |
Hardik Sharma; Hadi Esmaeilzadeh | Georgia Institute of Technology; University of California | |
Training; memory: bandwith-saving | Gist: Efficient Data Encoding for Deep Neural Network Training paper note |
Animesh Jain; Gennady Pekhimenko | Microsoft Research; University of Toronto; Univerity of Michigan | |
Inference; Cross-module optimization | The Dark Side of DNN Pruning paper note |
Reza Yazdani; Antonio Gonza ?lez | Universitat Polite ?cnica de Catalunya |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference | In-Datacenter Performance Analysis of a Tensor Processing Unit paper note |
Norman P. Jouppi | ||
Inference; Dataflow | Maximizing CNN Accelerator Efficiency Through Resource Partitioning paper note |
Yongming Shen | Stony Brook University | |
Training | SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks paper note |
Swagath Venkataramani; Anand Raghunathan | Purdue University; Parallel Computing Lab; Intel Corporation | |
Inference; Algorithm-architecture-codesign | Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism paper note |
Jiecao Yu; Scott Mahlke | University of Michigan; ARM | |
Inference; Sparsity | SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks paper note |
Angshuman Parashar; William J. Dally | NVIDIA; MIT; UC-Berkeley; Stanford University | |
Training; Low-bit | Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent paper note |
Christopher De Sa; Kunle Olukotun | Stanford University |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference;Sparsity | Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing paper note |
Jorge Albericio; Tayler Hetheringto | University of Toronto; University of British Columbia | |
Inference; Analog | ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars paper note |
Ali Shafiee; Vivek Srikumar | University of Utah,Hewlett Packard Labs | |
Inference; PIM | PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory paper note |
Ping Chi; Yuan Xie | University of California | |
Inference; Sparsity | EIE: Efficient Inference Engine on Compressed Deep Neural Network paper note |
Song Han; William J. Dally | Stanford University; NVIDIA | |
Inference; Analog | RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile paper note |
Robert LiKamWa; Lin Zhong | Rice University | |
Inference; Architecture-Physical-Co-design | Minerva: Enabling Low-Power; Highly-Accurate Deep Neural Network Accelerators paper note |
Brandon Reagen; David Brooks | Harvard University | |
Inference; Dataflow | Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks paper note |
Yu-Hsin Chen; Vivienne Sze | MIT; NVIDIA | |
Inference; 3D integration | Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory paper note |
Duckhwan Kim; Saibal Mukhopadhyay | Georgia Institute of Technology | |
Inference | Cambricon: An Instruction Set Architecture for Neural Networks paper note |
Shaoli Liu; Tianshi Chen | CAS; Cambricon Ltd. |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference; Cross-module optimization | ShiDianNao: Shifting Vision Processing Closer to the Sensor paper note |
Zidong Du | ICT |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference; Security | Shredder: Learning Noise Distributions to Protect Inference Privacy paper note |
Fatemehsadat Mireshghallah; Mohammadkazem Taram; et.al. | UCSD | |
Algorithm-Architecture co-design; Security | DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks paper note |
Xingbin Wang; Rui Hou; Boyan Zhao; et.al. | CAS; USC | |
programming model; Algorithm-Architecture co-design | Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators paper note |
Xuan Yang; Mark Horowitz; et.al. | Stanford; THU | |
Algorithm-Architecture co-design; security | DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints paper note codes |
Xing Hu; Yuan Xie; et.al. | UCSB | |
Training; distributed computing | Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training paper note |
Qinyi Luo; Jiaao He; Youwei Zhuo; Xuehai Qian | USC | |
compression | PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning paper |
Wei Niu; Xiaolong Ma; Sheng Lin; et.al. | College of William and Mary; Northeastern ; USC | |
Power optimization; compute-memory trade-off | Capuchin: Tensor-based GPU Memory Management for Deep Learning paper note |
Xuan Peng; Xuanhua Shi; Hulin Dai; et.al. | HUST; MSRA; USC | |
Compute-memory trade-off | NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units paper |
Bongjoon Hyun; Youngeun Kwon; Yujeong Choi; et.al. | KAIST | |
Algorithm-Architecture co-design | FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System paper note codes |
Size Zheng; Yun Liang; Shuo Wang; et.al. | PKU |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference, ReRAM | PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference paper note |
Aayush Ankit; Dejan S Milojičić; et.al. | Purdue; UIUC; HP | |
Reinforcement Learning | FA3C: FPGA-Accelerated Deep Reinforcement Learning paper note |
Hyungmin Cho; Pyeongseok Oh; Jiyoung Park; et.al. | Hongik University; SNU | |
Inference, ReRAM | FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture paper note |
Yu Ji; Yuan Xie; et.al. | THU; UCSB | |
Inference, Bit-serial | Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks paper note |
Alberto Delmas Lascorz; Andreas Ioannis Moshovos; et.al. | Toronto; NVIDIA | |
Inference, Dataflow | TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators paper note codes |
Mingyu Gao; Xuan Yang; Jing Pu; et.al. | Stanford | |
Inference, CNN, Systolic, Sparsity | Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization paper codes note |
Hsiangtsung Kung;Bradley McDanel; Saiqian Zhang | Harvard | |
Training, CNN, Distributed computing | Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization paper note |
Tian Jin; Seokin Hong | IBM; Kyungpook National University | |
Training, Distributed computing | HOP: Heterogeneity-Aware Decentralized Training paper note |
Qinyi Luo; Jinkun Lin; Youwei Zhuo; Xuehai Qian | USC; THU | |
Training, Compiler | Astra: Exploiting Predictability to Optimize Deep Learning paper note |
Muthian Sivathanu; Tapan Chugh; Sanjay S Singapuram; Lidong Zhou | Microsoft | |
Training, Quantization, Compression | ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers paper note |
Ao Ren; Tianyun Zhang; Shaokai Ye; et.al. | Northeastern; Syracuse; SUNY; Buffalo; USC | |
Security | DeepSigns: An End-to-End Watermarking Framework for Protecting the Ownership of Deep Neural Networks paper note |
Bita Darvish Rouhani; Huili Chen; Farinaz Koushanfar | UCSD |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Compiler | Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler paper slides note |
Yu Ji; Youhui Zhang; Wenguang Chen; Yuan Xie | Tsinghua; UCSB | |
Inference, Dataflow, NoC | MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects paper note slides |
Hyoukjun Kwon; Ananda Samajdar; Tushar Krishna | Georgia Tech | |
Bayesian | VIBNN: Hardware Acceleration of Bayesian Neural Networks paper note |
Ruizhe Cai; Ao Ren; Ning Liu; et.al. | Syracuse University; USC |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Dataflow, 3D Integration | Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory paper note |
Mingyu Gao; Jing Pu; Xuan Yang | Stanford University | |
CNN; Algorithm-Architecture co-design | SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing paper note |
Ao Ren; Zhe Li; Caiwen Ding | Syracuse University; USC; The City College of New York |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference | In-Datacenter Performance Analysis of a Tensor Processing Unit paper note |
Daofu Liu; Tianshi Chen; Shaoli Liu | CAS; USTC; Inria |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference | DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning paper note |
Tianshi Chen; Zidong Du; Ninghui Sun | CAS; Inria |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
PIM/CIM; systolic | Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration paper note |
Akshay Krishna Ramanathan1 | The Pennsylvania State University ; Intel | |
PIM; cache; reconfigurable | FReaC Cache: Folded-Logic Reconfigurable Computing in the Last Level Cache paper note |
Ashutosh Dhar | University of Illinois; Urbana-Champaign; †IBM Research; | |
Bayesian; sparsity | Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks paper note |
Qiyu Wan | ECOMS Lab; University of Houston | |
low-bit | Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks paper note |
Gil Shomron; Uri Weiser | Faculty of Electrical Engineering; Technion — Israel Institute of Technology | |
compiler | ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning paper note |
Sheng-Chun Kao; Geonhwa Jeong; Tushar Krishna | Georgia Institute of Technology | |
algorithm-architecture co-design; cross-module optimization | VR-DANN: Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration paper note |
Zhuoran Song; Feiyang Wu; Xueyuan Liu1 | Shanghai Jiao Tong University; Biren Research | |
PIM/CIM | Newton: A DRAM-Maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning paper note |
Mingxuan He | Purdue University | |
Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks paper note |
Soroush Ghodrati ;Byung Hoon Ahn ;Joon Kyung Kim | Bigstream Inc. ;Kansas University;University of Illinois Urbana-Champaign;NVIDIA Research;Google Inc. | ||
training; sparsity | Procrustes: A Dataflow and Accelerator for Sparse Deep Neural Network Training paper note |
Dingqing Yang; Amin Ghasemazar; Xiaowei Ren | The University of British Columbia; Microsoft Corporation | |
GPU; tensor core; compiler; bandwidth saving | Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores paper note |
Hyeonjin Kim; Sungwoo Ahn; Yunho Oh | Yonsei University; EcoCloud | |
algorithm-architecture co-design; compute-memory tradeoff | DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture paper note |
Liu Liu | UC Santa Barbara | |
inference; compression | TFE: Energy-Efficient Transferred Filter-Based Engine to Compress and Accelerate Convolutional Neural Networks paper note |
Huiyu Mo; Leibo Liu; Wenjing Hu | Tsinghua University;Intel | |
training; sparsity | TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training paper note |
Mostafa Mahmoud; Isak Edo; Ali Hadi Zadeh | University of Toronto;Cerebras Systems;Vector Institute | |
training; inference; sparsity; CPU | SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs paper note |
Zhangxiaowen Gong; Houxiang Ji | University of Illinois at Urbana-Champaign; Intel | |
NLP; sparsity; bandwidth saving | GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference paper note |
Ali Hadi Zadeh; Isak Edo; Omar Mohamed Awad | University of Toronto | |
training; cross-module optimization | TrainBox: An Extreme-Scale Neural Network Training Server Architecture by Systematically Balancing Operations paper note |
Pyeongsu Park; Heetaek Jeong; Jangwoo Kim | Seoul National University |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
compute-memory trade-off; Dataflow | Wire-Aware Architecture and Dataflow for CNN Accelerators paper note |
Sumanth Gudaparthi; Surya Narayanan; Rajeev Balasubramonian ; Edouard Giacomin ; Hari Kambalasubramanyam; Pierre-Emmanuel Gaillardon | Utah | |
security; compute-memory trade-off | ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning paper note |
Shang-Tse Chen; Cory Cornelius; Jason Martin; Duen Horng Chau | Georgia tech; intel | |
Inference; NoC; Cross-Module optimization | Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture paper note slides |
Yakun Sophia Shao;Jason Clemons; Rangharajan Venkatesan; et. al. | NVIDIA | |
compression; ISA; Cross-Module optimization | ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions paper note |
Berkin Akin; Zeshan A. Chishti; Alaa R. Alameldeen | Google; Intel | |
Algorithm-Architecture co-design | Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating paper note |
Weizhe Hua; Yuan Zhou; Christopher De Sa; et.al. | Cornell | |
Sparsity | SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks paper note |
Ashish Gondimalla; Noah Chesnu; Noah Chesnu; et.al. | Purdue | |
Power-optimization; Approximate; | EDEN: Enabling Approximate DRAM for DNN Inference using Error-Resilient Neural Networks paper note |
Skanda Koppula; Lois Orosa; A. Giray Yağlıkçı; et.al. | ETHZ | |
inference; CNN | eCNN: a Block-Based and Highly-Parallel CNN Accelerator for Edge Inference paper note |
Chao-Tsung Huang; Yu-Chun Ding;Huan-Ching Wang; et. al. | NTHU | |
Architecture-Physical co-design | TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning paper note |
Youngeun Kwon; Yunjae Lee; Minsoo Rhu | KAIST | |
Architecture-Physical co-design; dataflow | Understanding Reuse; Performance; and Hardware Cost of DNN Dataflows: A Data-Centric Approach paper note |
Hyoukjun Kwon; Prasanth Chatarasi; Michael Pellauer; et.al. | Georgia Tech; NVIDIA | |
sparsity; inference; | MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation paper note |
Lillian Pentecost, Marco Donato, Brandon Reagen; et.al. | Harvard; Facebook | |
RNN; Special operation; | Neuron-Level Fuzzy Memoization in RNNs paper note |
Franyell Silfa;Gem Dot; Jose-Maria Arnau; et.al. | UPC | |
inference; Algorithm-Architecture co-design; | Manna: An Accelerator for Memory-Augmented Neural Networks paper note |
Jacob R. Stevens; Ashish Ranjan; Dipankar Das; et.al. | Purdue; Intel | |
PIM | eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing paper note |
Elaheh Sadredini; Reza Rahimi; Vaibhav Verma;et.al. | Virginia | |
Sparsity | ExTensor: An Accelerator for Sparse Tensor Algebra paper note |
Kartik Hegde; Hadi Asghari-Moghaddam; Michael Pellauer | UIUC; NVIDIA | |
Sparsity; Algorithm-Architecture co-design | Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization paper note |
Fazle Sadi; Joe Sweeney; Tze Meng Low; et.al. | CMU | |
sparsity; Algorithm-Architecture co-design; compression | Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs paper note |
Maohua Zhu; Tao Zhang; Tao Zhang; Yuan Xie | UCSB; Alibaba | |
special operation; inference | ASV: Accelerated Stereo Vision System paper note codes1 codes2 |
Yu Feng; Paul Whatmough; Yuhao Zhu | Rochester | |
Algorithm-Architecture co-design; special operation | Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach paper note |
Mingyu Yan;Xing Hu; Shuangchen Li; et.al. | UCSB; ICT |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Sparsity | Cambricon-s: Addressing Irregularity in Sparse Neural Networks: A Cooperative Software/Hardware Approach paper note |
Xuda Zhou ; Zidong Du ; Qi Guo ; Shaoli Liu ; Chengsi Liu ; Chao Wang ; Xuehai Zhou ; Ling Li ; Tianshi Chen ; Yunji Chen | USTC; CAS | |
Inference; CNN; spatial correlation | Diffy: a Deja vu-Free Differential Deep Neural Network Accelerator paper note |
Mostafa Mahmoud ; Kevin Siu ; Andreas Moshovos | University of Toronto | |
Distributed computing | Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning paper note |
Youngeun Kwon; Minsoo Rhu | KAIST | |
RNN | Towards Memory Friendly Long-Short Term Memory Networks(LSTMs) on Mobile GPUs paper note |
Xingyao Zhang; Chenhao Xie; Jing Wang; et.al. | University of Houston; Capital Normal University | |
Training, distributed computing, compression | A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks paper note |
Youjie Li; Jongse Park; Mohammad Alian; et.al. | UIUC; THU; SJTU; Intel; UCSD | |
Inference, sparsity, compression | PermDNN: Efficient Compressed Deep Neural Network Architecture with Permuted Diagonal Matrices paper note |
Chunhua Deng; Siyu Liao; Yi Xie; et.al. | City University of New York; University of Minnesota; USC | |
Reinforcement Learning, algorithm-architecture co-design | GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware paper note |
Ananda Samajdar; Parth Mannan; Kartikay Garg; Tushar Krishna | Georgia Tech | |
Training, PIM | Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach paper note |
Jiawen Liu; Hengyu Zhao; et.al. | UCM; UCSD; UCSC | |
GAN, PIM | LerGAN: A Zero-free; Low Data Movement and PIM-based GAN Architecture paper note |
Haiyu Mao; Mingcong Song; Tao Li; et.al. | THU; University of Florida | |
Training, special operation, dataflow | Multi-dimensional Parallel Training of Winograd Layer on Memory-centric Architecture paper note |
Byungchul Hong; Yeonju Ro; John Kim | KAIST | |
PIM/CIM | SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator paper note |
Shuangchen Li; Alvin Oliver Glova; Xing Hu; et.al. | UCSB; Samsung | |
Inference, algorithm-architecture co-design | Morph: Flexible Acceleration for 3D CNN-based Video Understanding paper note |
Kartik Hegde; Rohit Agrawal; Yulun Yao; Christopher W Fletcher | UIUC |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Bit-serial | Bit-Pragmatic Deep Neural Network Computing paper note |
Jorge Albericio; Alberto Delmás; Patrick Judd; et.al. | NVIDIA; University of Toronto | |
CNN, Special computing | CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices paper note |
Caiwen Ding; Siyu Liao; Yanzhi Wang; et.al. | Syracuse University; City University of New York; USC; California State University; Northeastern University | |
PIM | DRISA: A DRAM-based Reconfigurable In-Situ Accelerator paper note |
Shuangchen Li; Dimin Niu; et.al. | UCSB; Samsung | |
Distributed computing | Scale-Out Acceleration for Machine Learning paper note |
Jongse Park; Hardik Sharma; Divya Mahajan; et.al. | Georgia Tech; UCSD | |
DNN, Sparsity, Bandwidth saving | DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission paper note |
Parker Hill; Animesh Jain; Mason Hill; et.al. | Univ. of Michigan; Univ. of Nevada |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
DNN, compiler, Dataflow | From High-Level Deep Neural Models to FPGAs paper note |
Hardik Sharma; Jongse Park; Divya Mahajan; et.al. | Georgia Institute of Technology; Intel | |
DNN, Runtime, training | vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design paper note |
Minsoo Rhu; Natalia Gimelshei; Jason Clemons; et.al. | NVIDIA | |
Bit-serial | Stripes: Bit-Serial Deep Neural Network Computing paper note |
Patrick Judd; Jorge Albericio; Tayler Hetherington; et.al. | University of Toronto; University of British Columbia | |
Sparsity | Cambricon-X: An Accelerator for Sparse Neural Networks paper note |
Shijin Zhang; Zidong Du; Lei Zhang; et.al. | Chinese Academy of Sciences | |
Neuromorphic, Spiking, programming model | NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints paper note |
Yu Ji; YouHui Zhang; ShuangChen Li; et.al. | Tsinghua University; UCSB | |
Cross Module optimization | Fused-Layer CNN Accelerators paper note |
Manoj Alwani; Han Chen; Michael Ferdman; Peter Milder | Stony Brook University | |
power optimization, cross module optimization | A Patch Memory System For Image Processing and Computer Vision paper note |
Jason Clemons; Chih-Chi Cheng; Iuri Frosio; Daniel Johnson; Stephen W. Keckler | NVIDIA; Qualcomm | |
power optimization | An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition paper note |
Reza Yazdani; Albert Segura; Jose-Maria Arnau; Antonio Gonzalez | Universitat Politecnica de Catalunya |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference, CNN | DaDianNao: A Machine-Learning Supercomputer paper note |
Yunji Chen; Tao Luo; Shaoli Liu; et.al. | CAS; Inria; Inner Mongolia University |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
ReRam | Deep Learning Acceleration with Neuron-to-Memory Transformation Paper note |
Mohsen Imani; Mohammad Samragh Razlighi; Yeseong Kim; et.al. | UCSD | |
graph network | HyGCN: A GCN Accelerator with Hybrid Architecture Paper note |
Mingyu Yan; Lei Deng; Xing Hu; et.al. | ICT; UCSB | |
training; sparsity | SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training Paper note Slides |
Eric Qin; Ananda Samajdar; Hyoukjun Kwon; et.al. | Georgia Tech | |
Programming model; DNN | PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible NPUs Paper note |
Yujeong Choi; Minsoo Rhu | KAIST | |
sparsity; compute-memory trade-off | ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator Paper note |
Bahar Asgari; Ramyad Hadidi; Tushar Krishna; et.al. | Georgia Tech | |
sparsity;Algorithm-Architecture co-design | SpArch: Efficient Architecture for Sparse Matrix Multiplication Paper note Project |
Zhekai Zhang; Hanrui Wan; Song Han ; William J. Dally | MIT; NVIDIA | |
Algorithm-Architecture co-design; Approximation | A3: Accelerating Attention Mechanisms in Neural Networks with Approximation Paper note |
Tae Jun Ham; Sung Jun Jung; Seonghak Kim; et.al. | SNU | |
training; Architecture-Physical co-design | AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays Paper note |
Linghao Song; Fan Chen; Youwei Zhuo; et.al. | Duke; USC | |
Special operation, architecture-physical co-design | PIXEL: Photonic Neural Network Accelerator Paper note |
Kyle Shiflett; Dylan Wright; Avinash Karanth; Ahmed Louri | Ohio; George Washington | |
Capasule; PIM | Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design Paper note |
Xingyao Zhang; Shuaiwen Leon Song; Chenhao Xie; et.al. | Houston | |
Bandwidth saving | Communication Lower Bound in Convolution Accelerators Paper note |
Xiaoming Chen; Yinhe Han; Yu Wang | ICT; THU | |
Training, Distributed computing; algorithm-architecture co-design | EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform Paper note |
Jianbo Dong; Zheng Cao; Tao Zhang; et.al. | Alibaba | |
NoC; | Experiences with ML-Driven Design: A NoC Case Study Paper note |
Jieming Yin; Subhash Sethumurugan; Yasuko Eckert; et.al. | AMD | |
sparsity | Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations Paper note |
Nitish Srivastava; Hanchen Jin; Shaden Smith; et.al. | Cornell; Intel | |
algorithm-architecture co-design | A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms Paper note |
Jian Weng; Sihao Liu; Zhengrong Wang; et.al. | UCLA | |
Reinforcement Learning; NoC; algorithm-architecture co-design | A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study Paper note |
Ting-Ru Lin; Drew Penney; Massoud Pedram; Lizhong Chen | USC; OSU | |
power optimization | Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices Paper note |
Jawad Haj-Yahya; Yanos Sazeides; Mohammed Alser; et.al. | ETHZ; Cyprus; CMU |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
training; compute-memory trade-off | HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array paper note |
Linghao Song; Jiachen Mao; Yiran Chen; et.al. | Duke; USC | |
RNN; algorithm-architecture co-design | E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs paper note |
Zhe Li; Caiwen Ding; Siyue Wang | Syracuse University; Northeastern University; Florida International University; USC; University at Buffalo | |
CNN, Bit-serial, Sparsity | Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks paper note |
Xiaowei Wang; Jiecao Yu; Charles Augustine; et.al. | Michigan; Intel | |
cross-Module optimization | Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators paper note |
Arash Azizimazreah; Lizhong Chen | OSU | |
PIM/CIM, low-bit, binary | NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks paper note |
Hyeonuk Kim; Jaehyeong Sim; Yeongjae Choi; Lee-Sup Kim | KAIST | |
Accuracy-Latency trade-off | Kelp: QoS for Accelerators in Machine Learning Platforms paper note |
Haishan Zhu; David Lo; Liqun Cheng | Microsoft; Google; UT Austin | |
inference | Machine Learning at Facebook: Understanding Inference at the Edge paper note |
Carole-Jean Wu; David Brooks; Kevin Chen; et.al. | ||
Architecture-Physical co-design | The Accelerator Wall: Limits of Chip Specialization paper note codes |
Adi Fuchs; David Wentzlaff | Princeton |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
special operation; approximate | Making Memristive Neural Network Accelerators Reliable paper note |
Ben Feinberg; Shibo Wang; Engin Ipek | University of Rochester | |
Algorithm-Architecture co-design; GAN | Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning papernote |
Mingcong Song; Jiaqi Zhang; Huixiang Chen; Tao Li | University of Florida | |
compression; sparsity | Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks paper note |
Minsoo Rhu; Mike O'Connor; Niladrish Chatterjee; et.al. | POSTECH; NVIDIA; UT-Austin | |
architecture-psychical co-design; inference | In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems paper note |
Mingcong Song; Kan Zhong; Tao li; et.a. | University of Florida; Chongqing University; Capital Normal University | |
Special operation; ReRam | GraphR: Accelerating Graph Processing Using ReRAM paper note |
Linghao Song; Youwei Zhuo; Xuehai Qian | Duke; USC; | |
pim; Special operation; datafow | GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition paper note |
Mingxing Zhang; Youwei Zhuo; Chao Wang; et.al. | THU; USC; Stanford | |
Power optimization; PIM | PM3: Power Modeling and Power Management for Processing-in-Memory paper note |
Chao Zhang; Tong Meng; Guangyu Sun | PKU |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Inference, CNN, Dataflow | FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks paper note |
Wenyan Lu; Guihai Yan; Jiajun Li; et.al. | Chinese Academy of Sciences | |
Inference, ReRAM | PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning paper note |
Linghao Song; Xuehai Qian; Hai Li; Yiran Chen | University of Pittsburgh; University of Southern California | |
Training | Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures paper note |
Mingcong Song; Yang Hu; Huixiang Chen; Tao Li | University of Florida |
Tags | - | Title | Authors | Affiliations |
---|---|---|---|---|
Programming model, training | TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning paper note |
Divya Mahajan; Jongse Park; Emmanuel Amaro | Georgia Institute of Technology | |
ReRam; Boltzmann | Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning paper note |
Mahdi Nazm Bojnordi; Engin Ipek | University of Rochester |