Skip to content

Commit

Permalink
Merge pull request #2440 from ivanfioravanti/main
Browse files Browse the repository at this point in the history
MLX 4bit and 8bit diff added
  • Loading branch information
paul-gauthier authored Nov 23, 2024
2 parents 65d7957 + 3dc5021 commit 80f5b60
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 17 deletions.
49 changes: 36 additions & 13 deletions aider/website/_data/quant.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,25 +70,48 @@

- dirname: 2024-11-22-17-53-35--qwen25-coder-32b-Instruct-4bit
test_cases: 133
model: mlx-community/Qwen2.5-Coder-32B-Instruct-4bit (whole)
edit_format: whole
commit_hash: 0ccf04a-dirty
pass_rate_1: 57.1
pass_rate_2: 69.2
percent_cases_well_formed: 100.0
error_outputs: 70
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 0
model: mlx-community/Qwen2.5-Coder-32B-Instruct-4bit
edit_format: diff
commit_hash: a16dcab-dirty
pass_rate_1: 60.2
pass_rate_2: 72.2
percent_cases_well_formed: 88.7
error_outputs: 31
num_malformed_responses: 30
num_with_malformed_responses: 15
user_asks: 6
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
exhausted_context_windows: 1
test_timeouts: 0
command: aider --model openai/mlx-community/Qwen2.5-Coder-32B-Instruct-4bit
date: 2024-11-22
date: 2024-11-23
versions: 0.64.2.dev
seconds_per_case: 53.4
total_cost: 0.0000

- dirname: 2024-11-23-15-07-20--qwen25-coder-32b-Instruct-8bit
test_cases: 133
model: mlx-community/Qwen2.5-Coder-32B-Instruct-8bit
edit_format: diff
commit_hash: a16dcab-dirty
pass_rate_1: 59.4
pass_rate_2: 72.2
percent_cases_well_formed: 92.5
error_outputs: 20
num_malformed_responses: 15
num_with_malformed_responses: 10
user_asks: 7
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 5
test_timeouts: 2
command: aider --model openai/mlx-community/Qwen2.5-Coder-32B-Instruct-8bit
date: 2024-11-23
versions: 0.64.2.dev
seconds_per_case: 173.7
seconds_per_case: 98.4
total_cost: 0.0000

- dirname: 2024-11-20-15-17-37--qwen25-32b-or-diff
Expand Down
6 changes: 2 additions & 4 deletions aider/website/_posts/2024-11-21-quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ aider's code editing benchmark, rivaling closed source frontier models.
But pay attention to how your model is being quantized, as it
can strongly impact code editing skill.
Heavily quantized models are often used by cloud API providers
and local model servers like Ollama.
and local model servers like Ollama or MLX.

<canvas id="quantChart" width="800" height="600" style="margin: 20px 0"></canvas>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
Expand All @@ -29,9 +29,7 @@ served both locally and from cloud providers.

- The [HuggingFace BF16 weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat).
- Hyperbolic labs API for [qwen2-5-coder-32b-instruct](https://app.hyperbolic.xyz/models/qwen2-5-coder-32b-instruct), which is using BF16. This result is probably within the expected variance of the HF result.
- A [4bit quant for mlx](https://t.co/cwX3DYX35D).
This is the only model which was benchmarked using the "whole" [edit format](https://aider.chat/docs/more/edit-formats.html).
The rest were benchmarked with the much more practical and challenging "diff"edit format.
- A [4bit quant for mlx](https://t.co/cwX3DYX35D).
- The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization.
- Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization, with Ollama's default 2k context window.

Expand Down

0 comments on commit 80f5b60

Please sign in to comment.