Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the check_results.py to support batch 2&4 #11133

Merged
merged 45 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
a353493
add batch 2&4 and exclude to perf_test
MeouSker77 May 24, 2024
97b2b2f
modify the perf-test&437 yaml
MeouSker77 May 24, 2024
e366b6f
modify llm_performance_test.yml
MeouSker77 May 24, 2024
b567979
remove batch 4
MeouSker77 May 24, 2024
114d6d2
modify check_results.py to support batch 2&4
MeouSker77 May 24, 2024
d4aead4
change the batch_size format
MeouSker77 May 27, 2024
ecbf7b0
remove genxir
MeouSker77 May 27, 2024
8dff59a
add str(batch_size)
MeouSker77 May 27, 2024
ea9f99c
change actual_test_casese in check_results file to support batch_size
MeouSker77 May 27, 2024
b7f6f0a
change html highlight
MeouSker77 May 28, 2024
0fdd6ab
less models to test html and html_path
MeouSker77 May 28, 2024
fdd6c1f
delete the moe model
MeouSker77 May 28, 2024
f6e99b6
split batch html
MeouSker77 May 28, 2024
8874b9f
split
MeouSker77 May 28, 2024
9b47f56
use installing from pypi
MeouSker77 May 28, 2024
ce0cf62
use installing from pypi - batch2
MeouSker77 May 28, 2024
bdd90b1
revert cpp
MeouSker77 May 28, 2024
94317bd
revert cpp
MeouSker77 May 28, 2024
b656909
merge two jobs into one, test batch_size in one job
MeouSker77 May 28, 2024
c2acf65
merge two jobs into one, test batch_size in one job
MeouSker77 May 28, 2024
7452afb
change file directory in workflow
MeouSker77 May 29, 2024
73e7a37
try catch deal with odd file without batch_size
MeouSker77 May 29, 2024
11cb3f6
modify pandas version
MeouSker77 May 29, 2024
b77dbc9
change the dir
MeouSker77 May 29, 2024
f1e6271
organize the code
MeouSker77 May 30, 2024
b210316
organize the code
MeouSker77 May 30, 2024
28baf78
remove Qwen-MOE
MeouSker77 May 30, 2024
c28750a
modify based on feedback
MeouSker77 Jun 3, 2024
b3d6d6a
modify based on feedback
MeouSker77 Jun 3, 2024
d54f94b
modify based on second round of feedback
MeouSker77 Jun 3, 2024
52cd168
modify based on second round of feedback + change run-arc.sh mode
MeouSker77 Jun 3, 2024
ddaafa6
modify based on second round of feedback + revert config
MeouSker77 Jun 3, 2024
cf4b9c1
modify based on second round of feedback + revert config
MeouSker77 Jun 3, 2024
5d29cfb
modify based on second round of feedback + remove comments
MeouSker77 Jun 3, 2024
8c20117
modify based on second round of feedback + remove comments
MeouSker77 Jun 3, 2024
188080b
modify based on second round of feedback + revert arc-perf-test
MeouSker77 Jun 3, 2024
2d054c4
modify based on third round of feedback
MeouSker77 Jun 3, 2024
883bcff
change error type
MeouSker77 Jun 4, 2024
e37b806
change error type
MeouSker77 Jun 4, 2024
441d48b
modify check_results.html
MeouSker77 Jun 4, 2024
ed5e0ce
split batch into two folders
MeouSker77 Jun 4, 2024
14b9e97
add all models
MeouSker77 Jun 5, 2024
5374ba9
move csv_name
MeouSker77 Jun 5, 2024
a937507
revert pr test
MeouSker77 Jun 5, 2024
1b2600d
revert pr test
MeouSker77 Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 59 additions & 20 deletions .github/workflows/llm_performance_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ permissions:

# Controls when the action will run.
on:
schedule:
- cron: "30 16 * * *" # GMT time, 16:30 GMT == 00:30 China
# schedule:
# - cron: "30 16 * * *" # GMT time, 16:30 GMT == 00:30 China
# please uncomment it for PR tests
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm_performance_tests.yml"
# - "python/llm/test/benchmark/**"
# - "python/llm/dev/benchmark/all-in-one/**"
pull_request:
branches: [main]
paths:
- ".github/workflows/llm_performance_tests.yml"
- "python/llm/test/benchmark/**"
- "python/llm/dev/benchmark/all-in-one/**"
workflow_dispatch:
workflow_call:

Expand All @@ -28,7 +28,7 @@ jobs:
# uses: ./.github/workflows/llm-binary-build.yml

llm-performance-test-on-arc:
if: ${{ github.event.schedule || github.event_name == 'workflow_dispatch' || github.event.inputs.artifact == 'llm-performance-test-on-arc' || github.event.inputs.artifact == 'all' }} # please comment it for PR tests
# if: ${{ github.event.schedule || github.event_name == 'workflow_dispatch' || github.event.inputs.artifact == 'llm-performance-test-on-arc' || github.event.inputs.artifact == 'all' }} # please comment it for PR tests
# needs: llm-cpp-build # please uncomment it for PR tests
strategy:
fail-fast: false
Expand Down Expand Up @@ -75,11 +75,11 @@ jobs:
shell: bash
run: |
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
test_version_date=`date -d 'yesterday' '+%Y%m%d'`
if ! pip show ipex-llm | grep $test_version_date; then
echo "Did not install ipex-llm with excepted version $test_version_date"
exit 1
fi
# test_version_date=`date -d 'yesterday' '+%Y%m%d'`
# if ! pip show ipex-llm | grep $test_version_date; then
# echo "Did not install ipex-llm with excepted version $test_version_date"
# exit 1
# fi

- name: Test installed xpu version
shell: bash
Expand All @@ -95,13 +95,21 @@ jobs:
source /opt/intel/oneapi/setvars.sh
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
# batch_size 1
cp python/llm/test/benchmark/arc-perf-test.yaml python/llm/dev/benchmark/all-in-one/config.yaml
cd python/llm/dev/benchmark/all-in-one
mkdir test_batch1
mkdir test_batch2
hkvision marked this conversation as resolved.
Show resolved Hide resolved
# hide time info
sed -i 's/str(end - st)/"xxxxxx"/g' run.py
# change csv name
sed -i 's/{today}/{today}_test1/g' run.py
python run.py
# batch_size 2
cd ../../../../../
cp python/llm/test/benchmark/arc-perf-test-batch2.yaml python/llm/dev/benchmark/all-in-one/config.yaml
cd python/llm/dev/benchmark/all-in-one
python run.py

- name: Test on xpu(transformers==4.37.0)
shell: bash
Expand All @@ -111,33 +119,64 @@ jobs:
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
# upgrade transformers for model Qwen/Qwen1.5-7B-Chat
python -m pip install transformers==4.37.0
# batch_size 1
cp python/llm/test/benchmark/arc-perf-transformers-437.yaml python/llm/dev/benchmark/all-in-one/config.yaml
cd python/llm/dev/benchmark/all-in-one
# change csv name
sed -i 's/test1/test2/g' run.py
python run.py
# batch_size 2
cd ../../../../../
cp python/llm/test/benchmark/arc-perf-transformers-437-batch2.yaml python/llm/dev/benchmark/all-in-one/config.yaml
cd python/llm/dev/benchmark/all-in-one
python run.py

- name: Concat csv and generate html
shell: bash
run: |
cd python/llm/dev/benchmark/all-in-one
python ../../../test/benchmark/concat_csv.py
# batch_size 1
cd python/llm/dev/benchmark/all-in-one/test_batch1
python ../../../../test/benchmark/concat_csv.py
for file in *.csv; do
if [[ $file != *test* ]]; then
cp "$file" $CSV_SAVE_PATH
fi
done
python -m pip install pandas==1.5.3
cd ../../../test/benchmark
cd ../../../../test/benchmark
python csv_to_html.py -f $CSV_SAVE_PATH
# batch_size 2
cd ../../../../
cd python/llm/dev/benchmark/all-in-one/test_batch2
python ../../../../test/benchmark/concat_csv.py
for file in *.csv; do
if [[ $file != *test* ]]; then
cp "$file" $CSV_SAVE_PATH
fi
done
cd ../../../../test/benchmark
python csv_to_html.py -f $CSV_SAVE_PATH

- name: Check and upload results to ftp
shell: bash
run: |
cd python/llm/dev/benchmark/all-in-one
python ../../../test/benchmark/check_results.py -c test1 -y ../../../test/benchmark/arc-perf-test.yaml
python ../../../test/benchmark/check_results.py -c test2 -y ../../../test/benchmark/arc-perf-transformers-437.yaml
# batch_size 1
cd python/llm/dev/benchmark/all-in-one/test_batch1
python ../../../../test/benchmark/check_results.py -c test1 -y ../../../../test/benchmark/arc-perf-test.yaml
python ../../../../test/benchmark/check_results.py -c test2 -y ../../../../test/benchmark/arc-perf-transformers-437.yaml
find . -name "*test*.csv" -delete
cd ../
rm -r test_batch1
if [ ${{ github.event_name }} == "schedule" ] || [ ${{ github.event_name }} == "workflow_dispatch" ]; then
curl -T ./*.csv ${LLM_FTP_URL}/llm/nightly_perf/gpu/
fi
# batch_size 2
cd test_batch2
python ../../../../test/benchmark/check_results.py -c test1 -y ../../../../test/benchmark/arc-perf-test-batch2.yaml
python ../../../../test/benchmark/check_results.py -c test2 -y ../../../../test/benchmark/arc-perf-transformers-437-batch2.yaml
find . -name "*test*.csv" -delete
cd ../
rm -r test_batch2
if [ ${{ github.event_name }} == "schedule" ] || [ ${{ github.event_name }} == "workflow_dispatch" ]; then
curl -T ./*.csv ${LLM_FTP_URL}/llm/nightly_perf/gpu/
fi
Expand Down
3 changes: 2 additions & 1 deletion python/llm/dev/benchmark/all-in-one/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ warm_up: 1 # must set >=2 when run "pipeline_parallel_gpu" test_api
num_trials: 3
num_beams: 1 # default to greedy search
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
batch_size: 1 # default to 1
batch_size:
- 1 # default to 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this?

in_out_pairs:
- '32-32'
- '1024-128'
Expand Down
26 changes: 14 additions & 12 deletions python/llm/dev/benchmark/all-in-one/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -1822,18 +1822,20 @@ def run_pipeline_parallel_gpu(repo_id,

import pandas as pd
for api in conf.test_api:
global csv_name
csv_name = f'{current_dir}/{api}-results-{today}.csv'
for model in conf.repo_id:
in_out_pairs = conf['in_out_pairs'].copy()
if excludes:
for in_out in conf['in_out_pairs']:
model_id_input = model + ':' + in_out.split('-')[0]
model_id_input_batch_size = model_id_input + ':' + str(conf['batch_size'])
if model_id_input in excludes or model_id_input_batch_size in excludes:
in_out_pairs.remove(in_out)
run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
conf['low_bit'], conf['cpu_embedding'], conf['batch_size'], streaming, use_fp16_torch_dtype, n_gpu)
for batch_size in conf["batch_size"]:
global csv_name
batch = str(batch_size)
csv_name = f'{current_dir}/test_batch{batch}/{api}-results-{today}-batch-{batch}.csv'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better not add a folder in this file.

for model in conf.repo_id:
in_out_pairs = conf['in_out_pairs'].copy()
if excludes:
for in_out in conf['in_out_pairs']:
model_id_input = model + ':' + in_out.split('-')[0]
model_id_input_batch_size = model_id_input + ':' + str(batch_size)
if model_id_input in excludes or model_id_input_batch_size in excludes:
in_out_pairs.remove(in_out)
run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
conf['low_bit'], conf['cpu_embedding'], batch_size, streaming, use_fp16_torch_dtype, n_gpu)
df = pd.DataFrame(results, columns=['model', '1st token avg latency (ms)', '2+ avg latency (ms/token)', 'encoder time (ms)',
'input/output tokens', 'batch_size', 'actual input/output tokens', 'num_beams', 'low_bit', 'cpu_embedding',
'model loading time (s)', 'peak mem (GB)', 'streaming', 'use_fp16_torch_dtype'])
Expand Down
39 changes: 39 additions & 0 deletions python/llm/test/benchmark/arc-perf-test-batch2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
repo_id:
- 'meta-llama/Llama-2-7b-chat-hf'
- 'meta-llama/Llama-2-13b-chat-hf'
- 'THUDM/chatglm2-6b'
- 'THUDM/chatglm3-6b-4bit'
- 'tiiuae/falcon-7b-instruct-with-patch'
- 'mosaicml/mpt-7b-chat'
- 'redpajama/gptneox-7b-redpajama-bf16'
- 'bigcode/starcoder-15.5b-4bit'
- 'databricks/dolly-v1-6b'
- 'databricks/dolly-v2-7b'
- 'databricks/dolly-v2-12b'
- 'internlm/internlm-chat-7b'
- 'Qwen/Qwen-7B-Chat'
- 'BAAI/AquilaChat-7B'
- 'baichuan-inc/Baichuan2-7B-Chat'
- 'baichuan-inc/Baichuan2-13B-Chat-4bit'
- 'bigscience/bloomz-7b1'
# - 'fnlp/moss-moon-003-sft-4bit' # moss-moon-003-sft cannot work on transformers 4.34+
- 'mistralai/Mistral-7B-v0.1'
local_model_hub: '/mnt/disk1/models'
warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
batch_size: # default to 1
- 2
in_out_pairs:
- '32-32'
- '1024-128'
- '2048-256'
test_api:
- "transformer_int4_gpu" # on Intel GPU
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api)
exclude:
- 'bigcode/starcoder-15.5b-4bit:2048:2'
- 'databricks/dolly-v2-12b:2048:2'
- 'baichuan-inc/Baichuan2-13B-Chat-4bit:2048:2'
- 'bigscience/bloomz-7b1:2048:2'
9 changes: 4 additions & 5 deletions python/llm/test/benchmark/arc-perf-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
batch_size: 1 # default to 1
batch_size: # default to 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this file

- 1
in_out_pairs:
- '32-32'
- '1024-128'
Expand All @@ -32,7 +33,5 @@ test_api:
- "transformer_int4_gpu" # on Intel GPU
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api)
exclude:
# - 'fnlp/moss-moon-003-sft-4bit:1024'
# - 'fnlp/moss-moon-003-sft-4bit:2048'
- 'baichuan-inc/Baichuan2-13B-Chat-4bit:2048'
- 'bigscience/bloomz-7b1:2048'
- 'baichuan-inc/Baichuan2-13B-Chat-4bit:2048:1'
- 'bigscience/bloomz-7b1:2048:1'
Copy link
Contributor

@hkvision hkvision Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this? make run.py compatible with the original yaml.

21 changes: 21 additions & 0 deletions python/llm/test/benchmark/arc-perf-transformers-437-batch2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# For the models that require transformers 4.37.0
repo_id:
- 'Qwen/Qwen1.5-7B-Chat'
- 'microsoft/phi-2'
- 'microsoft/Phi-3-mini-4k-instruct'
- 'meta-llama/Meta-Llama-3-8B-Instruct'
local_model_hub: '/mnt/disk1/models'
warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
batch_size: # default to 1
- 2
in_out_pairs:
- '32-32'
- '1024-128'
- '2048-256'
test_api:
- "transformer_int4_gpu" # on Intel GPU
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api)
exclude:
4 changes: 3 additions & 1 deletion python/llm/test/benchmark/arc-perf-transformers-437.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@ warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
batch_size: 1 # default to 1
batch_size: # default to 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this file

- 1
in_out_pairs:
- '32-32'
- '1024-128'
- '2048-256'
test_api:
- "transformer_int4_gpu" # on Intel GPU
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api)
exclude:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this if no exclude?

8 changes: 4 additions & 4 deletions python/llm/test/benchmark/check_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,16 @@ def main():
actual_test_num = len(csv_dataframe)
actual_test_cases = []
for index, row in csv_dataframe.iterrows():
actual_test_cases.append(row['model'] + ":" + row['input/output tokens'].split('-')[0])

actual_test_cases.append(row['model'] + ":" + row['input/output tokens'].split('-')[0] + ":" + str(row['batch_size']))
if args.yaml_name:
yaml_name = args.yaml_name
conf = OmegaConf.load(yaml_name)
all_test_cases = []
for model in conf.repo_id:
for in_out in conf['in_out_pairs']:
model_id_input = model + ':' + in_out.split('-')[0]
all_test_cases.append(model_id_input)
for batch_size in conf['batch_size']:
model_id_input = model + ':' + in_out.split('-')[0] + ':' + str(batch_size)
all_test_cases.append(model_id_input)
exclude_test_cases = []
if 'exclude' in conf and conf['exclude'] is not None:
exclude_test_cases = conf['exclude']
Expand Down
20 changes: 13 additions & 7 deletions python/llm/test/benchmark/csv_to_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,20 +99,25 @@ def main():
for current_csv_ind,current_csv_row in current_csv.iterrows():
current_csv_model=current_csv_row['model'].strip()
current_csv_input_output_pairs=current_csv_row['input/output tokens'].strip()
current_csv_model_input_1st=current_csv_model+'-'+current_csv_input_output_pairs+'-'+'1st'
current_csv_model_input_2nd=current_csv_model+'-'+current_csv_input_output_pairs+'-'+'2nd'
add_to_dict(csv_dict, current_csv_model_input_1st, current_csv_row[latency_1st_token])
add_to_dict(csv_dict, current_csv_model_input_2nd, current_csv_row[latency_2_avg])
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what case this try will fail?

current_csv_batch_size=str(current_csv_row['batch_size'])
current_csv_model_input_1st=current_csv_model+'-'+current_csv_input_output_pairs+'-'+current_csv_batch_size+'-'+'1st'
current_csv_model_input_2nd=current_csv_model+'-'+current_csv_input_output_pairs+'-'+current_csv_batch_size+'-'+'2nd'
add_to_dict(csv_dict, current_csv_model_input_1st, current_csv_row[latency_1st_token])
add_to_dict(csv_dict, current_csv_model_input_2nd, current_csv_row[latency_2_avg])
except:
pass

for latest_csv_ind,latest_csv_row in latest_csv.iterrows():

latest_csv_model=latest_csv_row['model'].strip()
latest_csv_input_output_pairs=latest_csv_row['input/output tokens'].strip()
latest_1st_token_latency=latest_csv_row[latency_1st_token]
latest_2_avg_latency=latest_csv_row[latency_2_avg]
latest_csv_batch_size=str(latest_csv_row['batch_size'])

key1=latest_csv_model+'-'+latest_csv_input_output_pairs+'-'+'1st'
key2=latest_csv_model+'-'+latest_csv_input_output_pairs+'-'+'2nd'
key1=latest_csv_model+'-'+latest_csv_input_output_pairs+'-'+latest_csv_batch_size+'-'+'1st'
key2=latest_csv_model+'-'+latest_csv_input_output_pairs+'-'+latest_csv_batch_size+'-'+'2nd'

best_last1_value=best_in_dict(csv_dict, key1, latest_1st_token_latency)
best_last2_value=best_in_dict(csv_dict, key2, latest_2_avg_latency)
Expand All @@ -128,8 +133,9 @@ def main():

previous_csv_model=previous_csv_row['model'].strip()
previous_csv_input_output_pairs=previous_csv_row['input/output tokens'].strip()
previous_csv_batch_size=str(previous_csv_row['batch_size'])

if latest_csv_model==previous_csv_model and latest_csv_input_output_pairs==previous_csv_input_output_pairs:
if latest_csv_model==previous_csv_model and latest_csv_input_output_pairs==previous_csv_input_output_pairs and latest_csv_batch_size==previous_csv_batch_size:

previous_1st_token_latency=previous_csv_row[latency_1st_token]
previous_2_avg_latency=previous_csv_row[latency_2_avg]
Expand Down
Loading