Skip to content

Commit

Permalink
Leaderboard Update, in sync with BFCL April 28th (New Model: `snowfla…
Browse files Browse the repository at this point in the history
…ke/arctic`) (#398)

This PR updates website leaderboard according with newly added model
`snowflake/arctic` from #397

This PR **DOES** change the leaderboard ranking.

This PR **DOES NOT** change the leaderboard score other than the added
model.

---------

Co-authored-by: Huanzhi (Hans) Mao <[email protected]>
  • Loading branch information
Fanjia-Yan and HuanzhiMao authored Apr 28, 2024
1 parent 7ae329d commit 8e00c85
Show file tree
Hide file tree
Showing 5 changed files with 37 additions and 27 deletions.
18 changes: 14 additions & 4 deletions assets/css/main_page.css
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,13 @@ em {

@media (max-width: 768px) {
.spinner {
left: 10%;
left: 15%;
}
}

@media (max-width: 380px) {
.spinner {
left: 5%;
left: 15%;
}
}

Expand Down Expand Up @@ -156,9 +156,19 @@ em {
}
}

.spinner-small-text-style {
font-size: 26px;
}
.spinner-big-text-style {
font-size: 30px;
}

@media screen and (max-width: 768px) {
.spinner-all-text-style {
font-size: 26px;
.spinner-small-text-style {
font-size: 18px;
}
.spinner-big-text-style {
font-size: 22px;
}
.spinner-container {
/* still keep as row */
Expand Down
35 changes: 18 additions & 17 deletions data.csv
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,21 @@ Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,
20,68.59%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,62.59%,62.17%,74.36%,80.75%,54.00%,64.00%,75.50%,55.50%,45.00%,71.18%,90.00%,44.29%,84.00%,46.00%,47.50%,83.33%,6.64,3.51,2.01,7.48
21,67.41%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,62.09%,60.01%,66.36%,88.75%,5.00%,10.00%,94.00%,25.50%,62.50%,83.53%,99.00%,61.43%,96.00%,8.00%,52.50%,84.17%,4.94,3.03,2.92,8.85
22,65.76%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,65.26%,74.92%,66.55%,79.25%,30.00%,38.00%,72.00%,72.00%,50.50%,71.18%,80.00%,58.57%,86.00%,80.00%,62.50%,55.83%,1.25,0.62,0.42,1.34
23,63.47%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,68.98%,64.93%,82.91%,91.50%,62.00%,56.00%,93.00%,31.50%,68.50%,94.71%,95.00%,94.29%,92.00%,8.00%,65.00%,0.00%,3.94,2.04,1.31,4.88
24,61.00%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,70.52%,81.38%,57.09%,57.50%,53.00%,62.00%,65.50%,90.00%,69.50%,93.53%,95.00%,91.43%,80.00%,82.00%,70.00%,2.08%,0.43,1.28,0.76,2.49
25,59.88%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.27%,52.62%,81.09%,90.25%,56.00%,58.00%,95.50%,39.00%,41.50%,96.47%,100.00%,91.43%,92.00%,12.00%,10.00%,0.00%,0.96,1.11,0.93,2.8
26,59.24%,Meta-Llama-3-8B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,59.68%,70.01%,58.73%,63.00%,44.00%,54.00%,73.00%,58.50%,48.50%,67.06%,67.00%,67.14%,82.00%,66.00%,65.00%,45.83%,0.24,0.04,N/A,N/A
27,59.18%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.06%,43.32%,76.73%,86.00%,49.00%,58.00%,87.50%,6.00%,6.00%,85.29%,96.00%,70.00%,88.00%,0.00%,0.00%,81.67%,3.43,3.32,1.45,6.91
28,58.53%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,67.99%,55.62%,71.45%,81.00%,42.00%,54.00%,81.00%,66.50%,53.00%,56.47%,78.00%,25.71%,70.00%,56.00%,40.00%,10.83%,0.49,0.08,N/A,N/A
29,56.24%,Gemini-1.5-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,43.11%,44.26%,81.45%,91.25%,51.00%,64.00%,91.00%,0.00%,0.00%,87.06%,97.00%,72.86%,90.00%,0.00%,0.00%,55.42%,1.28,2.35,3.19,3.78
30,53.47%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.69%,46.79%,85.27%,94.25%,60.00%,64.00%,93.00%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.54,0.61,2.46
31,52.82%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,38.53%,38.53%,61.64%,83.50%,4.00%,2.00%,92.50%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.39,3.41,3.21,10.69
32,52.76%,Gemini-1.0-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,39.58%,35.79%,67.82%,76.75%,37.00%,58.00%,90.50%,0.00%,0.00%,71.18%,74.00%,67.14%,72.00%,0.00%,0.00%,77.50%,0.2,1.16,0.74,1.93
33,52.71%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,39.94%,39.79%,67.27%,86.50%,13.00%,22.00%,92.50%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.5,1.51,4.49
34,52.18%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,55.09%,61.78%,70.36%,76.25%,52.00%,60.00%,75.50%,30.50%,44.00%,64.12%,93.00%,22.86%,82.00%,46.00%,55.00%,2.08%,N/A,1.85,1.39,4.47
35,50.12%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,46.91%,36.16%,49.64%,61.75%,26.00%,0.00%,56.50%,47.50%,34.00%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.66,1.53,4.99
36,42.76%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,39.05%,31.75%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,36.00%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
37,39.65%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,36.98%,30.89%,38.91%,49.50%,4.00%,24.00%,48.50%,37.00%,23.50%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
38,39.53%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,37.78%,38.03%,5.64%,5.75%,6.00%,4.00%,8.00%,79.00%,58.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,2.26,1.11,0.95,3.03
39,23.59%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.53%,34.37%,1.64%,2.25%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,1.95,2.93,1.9,6.23
23,64.53%,Snowflake/snowflake-arctic-instruct (Prompt),https://huggingface.co/Snowflake/snowflake-arctic-instruct,Snowflake,apache-2.0,58.84%,80.04%,64.36%,70.25%,42.00%,62.00%,69.00%,59.00%,43.00%,87.65%,91.00%,82.86%,86.00%,74.00%,72.50%,59.58%,N/A,0.99,0.56,2.13
24,63.47%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,68.98%,64.93%,82.91%,91.50%,62.00%,56.00%,93.00%,31.50%,68.50%,94.71%,95.00%,94.29%,92.00%,8.00%,65.00%,0.00%,3.94,2.04,1.31,4.88
25,61.00%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,70.52%,81.38%,57.09%,57.50%,53.00%,62.00%,65.50%,90.00%,69.50%,93.53%,95.00%,91.43%,80.00%,82.00%,70.00%,2.08%,0.43,1.28,0.76,2.49
26,59.88%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.27%,52.62%,81.09%,90.25%,56.00%,58.00%,95.50%,39.00%,41.50%,96.47%,100.00%,91.43%,92.00%,12.00%,10.00%,0.00%,0.96,1.11,0.93,2.8
27,59.24%,Meta-Llama-3-8B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,59.68%,70.01%,58.73%,63.00%,44.00%,54.00%,73.00%,58.50%,48.50%,67.06%,67.00%,67.14%,82.00%,66.00%,65.00%,45.83%,0.24,0.04,N/A,N/A
28,59.18%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.06%,43.32%,76.73%,86.00%,49.00%,58.00%,87.50%,6.00%,6.00%,85.29%,96.00%,70.00%,88.00%,0.00%,0.00%,81.67%,3.43,3.32,1.45,6.91
29,58.53%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,67.99%,55.62%,71.45%,81.00%,42.00%,54.00%,81.00%,66.50%,53.00%,56.47%,78.00%,25.71%,70.00%,56.00%,40.00%,10.83%,0.49,0.08,N/A,N/A
30,56.24%,Gemini-1.5-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,43.11%,44.26%,81.45%,91.25%,51.00%,64.00%,91.00%,0.00%,0.00%,87.06%,97.00%,72.86%,90.00%,0.00%,0.00%,55.42%,1.28,2.35,3.19,3.78
31,53.47%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.69%,46.79%,85.27%,94.25%,60.00%,64.00%,93.00%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.54,0.61,2.46
32,52.82%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,38.53%,38.53%,61.64%,83.50%,4.00%,2.00%,92.50%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.39,3.41,3.21,10.69
33,52.76%,Gemini-1.0-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,39.58%,35.79%,67.82%,76.75%,37.00%,58.00%,90.50%,0.00%,0.00%,71.18%,74.00%,67.14%,72.00%,0.00%,0.00%,77.50%,0.2,1.16,0.74,1.93
34,52.71%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,39.94%,39.79%,67.27%,86.50%,13.00%,22.00%,92.50%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.5,1.51,4.49
35,52.18%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,55.09%,61.78%,70.36%,76.25%,52.00%,60.00%,75.50%,30.50%,44.00%,64.12%,93.00%,22.86%,82.00%,46.00%,55.00%,2.08%,N/A,1.85,1.39,4.47
36,50.12%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,46.91%,36.16%,49.64%,61.75%,26.00%,0.00%,56.50%,47.50%,34.00%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.66,1.53,4.99
37,42.76%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,39.05%,31.75%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,36.00%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
38,39.65%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,36.98%,30.89%,38.91%,49.50%,4.00%,24.00%,48.50%,37.00%,23.50%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
39,39.53%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,37.78%,38.03%,5.64%,5.75%,6.00%,4.00%,8.00%,79.00%,58.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,2.26,1.11,0.95,3.03
40,23.59%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.53%,34.37%,1.64%,2.25%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,1.95,2.93,1.9,6.23
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,12 @@ <h6 class="text-center">[email protected], [email protected]</h6>
<div class="container spinner-container" style="background: #e5effc;">
<div>
<div class="col-md-12 api-appstore">
<h3 class="spinner-all-text-style">Systems and Algorithms for Integrating LLMs with Applications,
<h3 class="spinner-small-text-style">Systems and Algorithms for Integrating LLMs with Applications,
Tools, and Services</h3>
</div>
<div class="col-md-12">
<div class="spinner">
<h3 class="spinner-all-text-style">Gorilla Used at
<h3 class="spinner-big-text-style">Gorilla Used at
<div class="spinner__text">
<span class="spinner__text--top"
data-texts="Microsoft, Nvidia, Tesla, OpenAI, Linkedin, Netflix, MIT, Cisco, Anthropic, Weaviate, Cohere"></span>
Expand Down
5 changes: 2 additions & 3 deletions leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@
<link rel="stylesheet" href="assets/css/contact.css" />
<link rel="stylesheet" href="assets/css/styles.css" />
<title>
Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling
Leaderboard)
Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard)
</title>

</head>
Expand Down Expand Up @@ -200,7 +199,7 @@ <h2>Error Type Analysis</h2>
(coming soon).
</p>

<div id="84b2db9c-16e5-431a-8b51-de5fa49ee5dd" class="plotly-graph-div" style="height:100%; width:100%;"></div>
<div id="4f30cba8-aaa3-4006-8bde-19d01eb8df62" class="plotly-graph-div" style="height:100%; width:100%;"></div>
</div>

<!-- API Explorer Section -->
Expand Down
2 changes: 1 addition & 1 deletion treemap_2.js

Large diffs are not rendered by default.

0 comments on commit 8e00c85

Please sign in to comment.