Shopee-SlimMoA-v1

We are pleased to announce that our MoA solution has achieved state-of-the-art (SOTA) performance on the AlpacaEval 2.0 Benchmark with length-controlled evaluation.

AlpacaEval2.0 Evaluation Results

Models	LC Win Rate	Win Rate
Shopee-SlimMoA-v1	77.5%	75.6%
Blendax.AI-gm-l3-v35	73.4%	73.4%
OpenPipe MoA GPT-4 Turbo	68.4%	63.2%
Together MoA	65.4%	59.9%

gemma-2-9b-it-SimPO	72.4%	65.9%
Storm-7B	61.6%	63.0%
Qwen2 72B Instruct	38.1%	29.9%
Llama 3.1 70B Instruct	38.1%	39.1%

GPT-4 Omni (05/13)	57.5%	51.3%
GPT-4 Turbo (04/09)	55.0%	46.1%
Claude 3.5 Sonnet (06/20)	52.4%	40.6%
Yi-Large Preview	51.9%	57.5%
GPT-4o Mini (07/18)	50.7%	44.7%
GPT-4 Preview (11/06)	50.0%	50.0%

Feature: Efficiency

Minimally, we only need 1xA100 card to run our MoA pipeline.

Through our ablations, we found that the quality of candidate responses is a second-order effect and has a limited impact on the final performance. For our scenario (chat-related benchmark), using the quantized versions of open-source large models (~70B) exhibit excellent response capabilities, and the final performance even increases slightly compared to using the non-quantized versions.
By utilizing quantized models, we reduce the computational overhead and resource requirements associated with LLM. This optimization not only accelerates inference times but also lowers operational costs, making our approach both economically and technically advantageous.
Therefore, in our pipeline, all models can perform single-card inference, achieving utilities far beyond typical MoA implementations, which require multi-cards or distributed MoA implementations. The streamlined efficiency of our pipeline ensures rapid deployment and scalability, enabling us to handle high volumes of interactions with remarkable speed.

Feature: High Accuracy

Our experiments have unveiled that integrating multiple open-source LLM from a variety of sources can substantially elevate the quality of answer generation. Our resulting choice balances between the strengths and unique perspectives of each model, leading to a richer and more nuanced response pool.
The diversity plays a crucial role, adding a marginal model from a different source is more beneficial to the overall performance than adding a more capable model (evaluated on its solo performance) but from similar sources. The possible reason is that heterogeneity in the generated answers is instrumental in refining the overall performance, as it mitigates individual model biases and leverages the collective intelligence of the ensemble.

Feature: Aggregator

We found that the solo performance of an LLM is positively correlated with its overall performance when used as the aggregator. As a result, utilizing models specifically optimized for chat performance as aggregators is markedly more suitable for aggregation of chat outputs, compared to employing open-source large models that are more comprehensive in general abilities. For the AlpacaEval benchmark, models with superior chat performance (designed to handle conversational nuances, manage context transitions smoothly, and generate coherent, contextually relevant responses) are exceptionally well-suited.
This discovery underscores the importance of tailoring aggregator's model selection to the specific requirements of the task at hand.
We did not perform any prompt engineering; the aggregator's prompt was directly copied from MoA. This suggests there is room for future improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shopee-SlimMoA-v1

AlpacaEval2.0 Evaluation Results

Feature: Efficiency

Feature: High Accuracy

Feature: Aggregator

About

Releases

Packages

LLM-Alignment-sh/Shopee-SlimMoA

Folders and files

Latest commit

History

Repository files navigation

Shopee-SlimMoA-v1

AlpacaEval2.0 Evaluation Results

Feature: Efficiency

Feature: High Accuracy

Feature: Aggregator

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages