Skip to content

Latest commit

 

History

History
46 lines (27 loc) · 5.94 KB

README.md

File metadata and controls

46 lines (27 loc) · 5.94 KB

Reading

List of Mixture of Experts (MoE) and Large Language Model (LLM) Papers focusing on model Upcycling

This repository contains a collection of important papers related to Mixture of Experts (MoE) and Large Language Models (LLMs), along with links to their corresponding Arxiv pages and available GitHub code.

# Paper Title Year Link Code
1 Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence 2024 Arxiv No code available
2 Upcycling Large Language Models into Mixture of Experts 2024 Arxiv No code available
3 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts 2024 Arxiv No code available
4 SELF-MOE: TOWARDS COMPOSITIONAL LARGE LANGUAGE MODELS WITH SELF-SPECIALIZED EXPERTS 2024 Arxiv No code available
5 Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization 2024 Arxiv GitHub
6 Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM 2024 Arxiv No code available
7 SCALING LAWS FOR FINE-GRAINED MIXTURE OF EXPERTS 2024 Arxiv GitHub
8 Scaling expert language models with unsupervised domain discovery 2023 Arxiv GitHub
9 Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling 2023 Arxiv GitHub
10 Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models 2023 Arxiv GitHub
11 SPARSE UPCYCLING: TRAINING MIXTURE-OF-EXPERTS FROM DENSE CHECKPOINTS 2023 Arxiv GitHub
12 Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging 2024 Arxiv Not available

*Advances in Weight Generation and Retrieval for Language Models

# Paper Title Year Link Code
1 Representing Model Weights with Language using Tree Experts 2024 Arxiv No code available
2 Deep Linear Probe Generators for Weight Space Learning 2024 Arxiv GitHub
3 Knowledge Fusion By Evolving Weights of Language Models 2024 Arxiv GitHub

Vector Quantization Prompting for Continual Learning

Historical Test-time Prompt Tuning for Vision Foundation Models

Contributing

Feel free to open a pull request if you find new papers or code related to MoE and LLMs. Let's keep this list growing!