Skip to content

lzyrapx/Papers-Books-Reading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ā 
Ā 
Ā 
Ā 
Ā 

Repository files navigation

Papers-Reading

LLM

Models

Date Paper Key Words
2023.4.17 Visual Instruction Tuning LLaVa
2024.3.8 DeepSeek-VL: Towards Real-World Vision-Language Understanding DeepSeek-VL: Dense & VLM
2024.7.10 PaliGemma: A versatile 3B VLM for transfer Google small VLM: Paligemma
2024.10.8 Aria: An Open Multimodal Native Mixture-of-Experts Model First MoE VLM: Aria
2024.12.6 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling VLM: InternVL 2.5
2024.12.13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding DeepSeek-VL2: MOE & VLM
2024.12.27 DeepSeek-V3 Technical Report DeepSeek-V3 Technical Report

Quantization

Date Paper Key Words
2022.6.4 ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers INT8 weights and INT8 activations
2022.8.15 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale LLM.int8
2022.11.18 SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models 8-bit Weightļ¼Œ8-bit Activation (W8A8)
2023.5.23 Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization Parameter-Efficient and Quantization-aware Adaptation (PEQA) [LLM-QAT]
2023.5.23 QLoRA: Efficient Finetuning of Quantized LLMs QLoRA & NF4 (4-bit NormalFloat) [LLM-QAT]
2023.5.29 LLM-QAT: Data-Free Quantization Aware Training for Large Language Models LLM Quantization Aware Training [LLM-QAT]
2023.3.13 FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU KV Cache 4-bit
2023.6.1 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Activation-aware Weight Quantization (AWQ)
2023.6.13 SqueezeLLM: Dense-and-Sparse Quantization KV Cache 3-bit
2024.1.31 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization KV Cache 2态3态4-bit
2024.2.5 KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache KV Cache 2-bit
2024.2.26 A Comprehensive Evaluation of Quantization Strategies for Large Language Models PTQ

MOE

Date Paper Key Words
2021.1.11 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Mixture of Expert (MoE)
2024.1.11 DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models DeepSeekMoE

Inference

Date Paper Key Words
2017.6.12 Attention Is All You Need Transformer & Attention
2018.6.11 Improving Language Understanding by Generative Pre-Training Generative transformer model
2018.10.11 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT (Bidirectional Encoder Representations from Transformers)
2019.1.9 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Transformer-XL (extra-long)
2019.5.17 ERNIE: Enhanced Language Representation with Informative Entities Knowledge graphs with BERT
2022.5.27 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Flash Attention
2023.7.18 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Flash Attention 2
2024.2.27 Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations LLM for Large-scale recommendation systems
2024.3.19 When Do We Not Need Larger Vision Models? Scaling on Scales
2024.7.12 FlashAttention-3 is optimized for Hopper GPUs (e.g. H100) Flash Attention 3
2024.7.28 Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights Advertising with Multimodal
2024.8.22 NanoFlow: Towards Optimal Large Language Model Serving Throughput A novel serving framework: NanoFlow
2024.10.3 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Sage Attention
2024.11.17 SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization Sage Attention 2

Transformer

Date Paper Key Words
2020.10.22 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Vision Transformer (ViT)

Others

Date Paper Key Words
2019.2.24 Language Models are Unsupervised Multitask Learners GPT-2
2019.10.2 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Bert distilled version & knowledge distillation
2019.10.23 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Unified Text-to-Text Transformer & T5 (Encoder-Decoder)
2020.05.22 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Retrieval-Augmented Generation (RAG)
2020.5.28 Language Models are Few-Shot Learners GPT-3
2021.4.20 RoFormer: Enhanced Transformer with Rotary Position Embedding RoPE

Algorithm

Date Paper Key Words
1972 Karp's 21 NP-complete problems Karp's 21 NP-complete problems
1973 An n^{5/2} algorithm for maximum matchings in bipartite graphs Hopcroft-Karp Algorithm
2002 A 27/26-Approximation Algorithm for the Chromatic Sum Coloring of Bipartite Graphs Chromatic Sum Coloring of Bipartite Graphs
2015.6.16 An Efficient Data Structure for Processing Palindromes in Strings Palindromic Tree
2017.8.11 An Introduction to Quantum Computing, Without the Physics Quantum Computing, Without the Physics
2018.7.30 A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum
2021.2.11 Hybrid Neural Fusion for Full-frame Video Stabilization Video Stabilization Algorithm
2022.11.21 The Berlekamp-Massey Algorithm revisited Berlekamp-Massey Algorithm

Engineering

Date Paper Key Words

About

šŸ¬Some papers & books Iā€™ve read.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published