A curated list of LLMs and related studies targeted at mobile and embedded hardware
Last update: 10th April 2024
If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.
Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.
- Mobile-First LLMs
- Infrastructure / Deployment of LLMs on Device
- Benchmarking LLMs on Device
- Applications
- Multimodal LLMs
- Surveys on Efficient LLMs
- Training LLMs on Device
- Mobile-Related Use-Cases
- Related Awesome Repositories
The following Table shows sub-3B models designed for on-device deployments, sorted by year.
Name | Year | Sizes | Primary Group/Affiliation | Publication | Code Repository | HF Repository |
---|---|---|---|---|---|---|
Mobile LLMs | 2024 | 125M, 250M | Meta | paper | - | - |
Gemma | 2024 | 2B, ... | website | code, gemma.cpp | huggingface | |
MobiLlama | 2024 | 0.5B, 1B | MBZUAI | paper | code | huggingface |
TinyLlama | 2024 | 1.1B | Singapore University of Technology and Design | paper | code | huggingface |
Gemini-Nano | 2024 | 1.8B, 3.25B | paper | - | - | |
Phi-2 | 2023 | 2.7B | Microsoft | website | - | huggingface |
Phi-1.5 | 2023 | 1.3B | Microsoft | paper | - | huggingface |
Phi-1 | 2023 | 1.3B | Microsoft | paper | - | huggingface |
RWKV | 2023 | 169M, 430M, 1.5B, 3B, ... | EleutherAI | paper | code | huggingface |
Cerebras-GPT | 2023 | 111M, 256M, 590M, 1.3B, 2.7B ... | Cerebras | paper | code | huggingface |
OPT | 2022 | 125M, 350M, 1.3B, 2.7B, ... | Meta | paper | code | huggingface |
LaMini-LM | 2023 | 61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ... | MBZUAI | paper | code | huggingface |
Pythia | 2023 | 70M, 160M, 410M, 1B, 1.4B, 2.8B, ... | EleutherAI | paper | code | huggingface |
Galactica | 2022 | 125M, 1.3B, ... | Meta | paper | code | huggingface |
BLOOM | 2022 | 560M, 1.1B, 1.7B, 3B, ... | BigScience | paper | code | huggingface |
XGLM | 2021 | 564M, 1.7B, 2.9B, ... | Meta | paper | code | huggingface |
GPT-Neo | 2021 | 125M, 350M, 1.3B, 2.7B | EleutherAI | - | code, gpt-neox | huggingface |
MobileBERT | 2020 | 15.1M, 25.3M | CMU, Google | paper | code | huggingface |
BART | 2019 | 140M, 400M | Meta | paper | code | huggingface |
DistilBERT | 2019 | 66M | HuggingFace | paper | code | huggingface |
T5 | 2019 | 60M, 220M, 770M, 3B, ... | paper | code | huggingface | |
TinyBERT | 2019 | 14.5M | Huawei | paper | code | huggingface |
Megatron-LM | 2019 | 336M, 1.3B, ... | Nvidia | paper | code | - |
This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.
- llama.cpp
- LLMFarm: iOS frontend for llama.cpp
- Sherpa: Android frontend for llama.cpp
- dusty-nv's llama.cpp: Containers for Jetson deployment of llama.cpp
- MLC-LLM
- Android App: MLC Android app
- iOS App: MLC iOS app
- dusty-nv's MLC: Containers for Jetson deployment of MLC
- Google MediaPipe
- Apple MLX
- Alibaba MNN
- llama2.c (More educational, see here for android port)
- tinygrad
- TinyChatEngine (Targeted at Nvidia, Apple M1 and RPi)
- [MobiCom'24] Mobile Foundation Model as Firmware (paper, code)
- Merino: Entropy-driven Design for Generative Language Models on IoT Devicess (paper)
- LLM as a System Service on Mobile Devices (paper)
- LLMCad: Fast and Scalable On-device Large Language Model Inference (paper)
- EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models (paper)
This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.
- MELTing point: Mobile Evaluation of Language Transformers (paper)
- Octopus v2: On-device language model for super agent (paper)
- Towards an On-device Agent for Text Rewriting (paper)
This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.
- TinyLLaVA: A Framework of Small-scale Large Multimodal Models (paper, code)
- MobileVLM V2: Faster and Stronger Baseline for Vision Language Model (paper, code)
This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.
- A Survey of Resource-efficient LLM and Multimodal Foundation Models (paper)
- Efficient Large Language Models: A Survey (paper, code)
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems (paper)
- A Survey on Model Compression for Large Language Models (paper)
This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.
- [MobiCom'23] Federated Few-Shot Learning for Mobile NLP (paper, code)
- FwdLLM: Efficient FedLLM using Forward Gradient (paper, code)
- [Electronics'24] Forward Learning of Large Language Models by Consumer Devices (paper)
- Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly (paper)
- Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (paper, code)
This section includes paper that are mobile-related, but not necessarily run on device.
- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs (paper)
- Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (paper, code)
- [NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control (paper, code)
- GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation (paper, code)
- [ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences (paper)
If you want to read more about related topics, here are some tangential awesome repositories to visit:
- Hannibal046/Awesome-LLM on Large Language Models
- KennethanCeyer/awesome-llm on Large Language Models
- HuangOwen/Awesome-LLM-Compression on Large Language Model Compression
- csarron/awesome-emdl on Embedded and Mobile Deep Learning
Contributions welcome! Read the contribution guidelines first.