Skip to content

Commit

Permalink
copy
Browse files Browse the repository at this point in the history
  • Loading branch information
paul-gauthier committed Nov 24, 2024
1 parent c550422 commit aee94a0
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 6 deletions.
23 changes: 23 additions & 0 deletions aider/website/_data/quant.yml
Original file line number Diff line number Diff line change
Expand Up @@ -274,3 +274,26 @@
versions: 0.64.2.dev
seconds_per_case: 110.0
total_cost: 0.1763

- dirname: 2024-11-24-15-00-50--qwen25-32b-or-deepinfra
test_cases: 133
model: "Deepinfra via OpenRouter: BF16"
edit_format: diff
commit_hash: c2f184f
pass_rate_1: 57.1
pass_rate_2: 69.9
percent_cases_well_formed: 89.5
error_outputs: 35
num_malformed_responses: 31
num_with_malformed_responses: 14
user_asks: 11
lazy_comments: 0
syntax_errors: 1
indentation_errors: 1
exhausted_context_windows: 4
test_timeouts: 1
command: aider --model openrouter/qwen/qwen-2.5-coder-32b-instruct
date: 2024-11-24
versions: 0.64.2.dev
seconds_per_case: 28.5
total_cost: 0.1390
19 changes: 13 additions & 6 deletions aider/website/_posts/2024-11-21-quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,6 @@ can impact code editing skill.
Heavily quantized models are often used by cloud API providers
and local model servers like Ollama or MLX.

<canvas id="quantChart" width="800" height="500" style="margin: 20px 0"></canvas>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<script>
{% include quant-chart.js %}
</script>

The graph above compares different versions of the Qwen 2.5 Coder 32B Instruct model,
served both locally and from cloud providers.
Expand All @@ -34,11 +29,23 @@ served both locally and from cloud providers.
- Other API providers.

The best version of the model rivals GPT-4o, while the worst performer
is more like GPT-4 Turbo level.
is worse than GPT-3.5 Turbo.

Hyperbolic via OpenRouter in particular is confusing.
Their direct API produces excellent results, but the performance
through OpenRouter is very poor.
It's unclear why this is happening to just this provider.
The other providers available through OpenRouter perform similarly
when their API is accessed directly.

{: .note }
This article is being updated as additional benchmark runs complete.

<canvas id="quantChart" width="800" height="600" style="margin: 20px 0"></canvas>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<script>
{% include quant-chart.js %}
</script>

<input type="text" id="quantSearchInput" placeholder="Search..." style="width: 100%; max-width: 800px; margin: 10px auto; padding: 8px; display: block; border: 1px solid #ddd; border-radius: 4px;">

Expand Down

0 comments on commit aee94a0

Please sign in to comment.