-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Llama 3.2 to iGPU performance test (
transformers 4.45
) (#12209)
* Add Llama 3.2 to iGPU Perf (#12200) * Add Llama 3.2 to iGPU Perf * Downgrade accelerate after step * Temporarily disable model for test * Temporarily change ERRORLEVEL check (#12201) * Restore llama3.2 perf (#12206) * Revert "Temporarily change ERRORLEVEL check" This reverts commit 909dbbc. * Revert "Temporarily disable model for test" This reverts commit 95322dc. --------- Co-authored-by: Jin, Qiao <[email protected]>
- Loading branch information
1 parent
f6611f9
commit c9ac39f
Showing
8 changed files
with
307 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
repo_id: | ||
- 'meta-llama/Llama-3.2-1B-Instruct' | ||
- 'meta-llama/Llama-3.2-3B-Instruct' | ||
local_model_hub: 'path to your local model hub' | ||
warm_up: 1 | ||
num_trials: 3 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '1024-128' | ||
test_api: | ||
- "transformer_int4_gpu_win" # on Intel GPU for Windows (catch GPU peak memory) | ||
cpu_embedding: True # whether put embedding to CPU (only avaiable now for gpu win related test_api) |
14 changes: 14 additions & 0 deletions
14
python/llm/test/benchmark/igpu-perf/1024-128_int4_fp16_445.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
repo_id: | ||
- 'meta-llama/Llama-3.2-1B-Instruct' | ||
- 'meta-llama/Llama-3.2-3B-Instruct' | ||
local_model_hub: 'path to your local model hub' | ||
warm_up: 1 | ||
num_trials: 3 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '1024-128' | ||
test_api: | ||
- "transformer_int4_fp16_gpu_win" # on Intel GPU for Windows, use fp16 for non-linear layer | ||
cpu_embedding: True # whether put embedding to CPU (only avaiable now for gpu win related test_api) |
14 changes: 14 additions & 0 deletions
14
python/llm/test/benchmark/igpu-perf/1024-128_int4_fp16_loadlowbit_445.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
repo_id: | ||
- 'meta-llama/Llama-3.2-1B-Instruct' | ||
- 'meta-llama/Llama-3.2-3B-Instruct' | ||
local_model_hub: 'path to your local model hub' | ||
warm_up: 1 | ||
num_trials: 3 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '1024-128' | ||
test_api: | ||
- "transformer_int4_fp16_loadlowbit_gpu_win" # on Intel GPU for Windows (catch GPU peak memory) | ||
cpu_embedding: True # whether put embedding to CPU (only avaiable now for gpu win related test_api) |
14 changes: 14 additions & 0 deletions
14
python/llm/test/benchmark/igpu-perf/2048-256_int4_fp16_445.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
repo_id: | ||
- 'meta-llama/Llama-3.2-1B-Instruct' | ||
- 'meta-llama/Llama-3.2-3B-Instruct' | ||
local_model_hub: 'path to your local model hub' | ||
warm_up: 1 | ||
num_trials: 3 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '2048-256' | ||
test_api: | ||
- "transformer_int4_fp16_gpu_win" # on Intel GPU for Windows (catch GPU peak memory) | ||
cpu_embedding: True # whether put embedding to CPU (only avaiable now for gpu win related test_api) |
14 changes: 14 additions & 0 deletions
14
python/llm/test/benchmark/igpu-perf/3072-384_int4_fp16_445.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
repo_id: | ||
- 'meta-llama/Llama-3.2-1B-Instruct' | ||
- 'meta-llama/Llama-3.2-3B-Instruct' | ||
local_model_hub: 'path to your local model hub' | ||
warm_up: 1 | ||
num_trials: 3 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '3072-384' | ||
test_api: | ||
- "transformer_int4_fp16_gpu_win" # on Intel GPU for Windows (catch GPU peak memory) | ||
cpu_embedding: True # whether put embedding to CPU (only avaiable now for gpu win related test_api) |
14 changes: 14 additions & 0 deletions
14
python/llm/test/benchmark/igpu-perf/32-32_int4_fp16_445.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
repo_id: | ||
- 'meta-llama/Llama-3.2-1B-Instruct' | ||
- 'meta-llama/Llama-3.2-3B-Instruct' | ||
local_model_hub: 'path to your local model hub' | ||
warm_up: 3 | ||
num_trials: 5 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '32-32' | ||
test_api: | ||
- "transformer_int4_fp16_gpu_win" # on Intel GPU for Windows (catch GPU peak memory) | ||
cpu_embedding: True # whether put embedding to CPU (only avaiable now for gpu win related test_api) |
Oops, something went wrong.