Skip to content

Commit

Permalink
add cache emptiying between forward and generate
Browse files Browse the repository at this point in the history
  • Loading branch information
IlyasMoutawwakil committed Oct 31, 2023
1 parent d650c0d commit e31cd4c
Show file tree
Hide file tree
Showing 55 changed files with 375 additions and 263 deletions.
6 changes: 2 additions & 4 deletions examples/running-llamas/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Optimum-Benchmark x LLaMAs x BnB & GPTQ
# Optimum-Benchmark x LLaMAs x GPTQ

A set of benchmarks on Meta's LLaMA2's inference.

Expand All @@ -7,7 +7,6 @@ A set of benchmarks on Meta's LLaMA2's inference.
You will need to install these quantization packages:

```bash
pip install bitsandbytes
pip install auto-gptq # or install it from source
```

Expand All @@ -17,11 +16,10 @@ Then run these commands from this directory:

```bash
optimum-benchmark --config-dir configs/ --config-name _base_ --multirun
optimum-benchmark --config-dir configs/ --config-name bnb --multirun
optimum-benchmark --config-dir configs/ --config-name gptq --multirun
```

This will create a folder called `experiments` with the results of the benchmarks with an inference `batch_size` ranging from 1 to 16 and an input `sequence_length` (prompt size) of 512.
This will create a folder called `experiments` with the results of the benchmarks with an inference `batch_size` ranging from 1 to 16 and an input `sequence_length` (prompt size) of 256.

## Reporting

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 10 additions & 10 deletions examples/running-llamas/artifacts/A100-80GB/full_report.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
experiment_name,backend.name,backend.version,backend._target_,backend.seed,backend.inter_op_num_threads,backend.intra_op_num_threads,backend.initial_isolation_check,backend.continous_isolation_check,backend.delete_cache,backend.no_weights,backend.device_map,backend.torch_dtype,backend.disable_grad,backend.eval_mode,backend.amp_autocast,backend.amp_dtype,backend.torch_compile,backend.bettertransformer,backend.quantization_scheme,backend.use_ddp,backend.peft_strategy,benchmark.name,benchmark._target_,benchmark.duration,benchmark.warmup_runs,benchmark.memory,benchmark.energy,benchmark.input_shapes.batch_size,benchmark.input_shapes.sequence_length,benchmark.input_shapes.num_choices,benchmark.input_shapes.feature_size,benchmark.input_shapes.nb_max_frames,benchmark.input_shapes.audio_sequence_length,benchmark.new_tokens,benchmark.can_diffuse,benchmark.can_generate,benchmark.generate_kwargs.max_new_tokens,benchmark.generate_kwargs.min_new_tokens,benchmark.generate_kwargs.do_sample,benchmark.generate_kwargs.use_cache,benchmark.generate_kwargs.pad_token_id,benchmark.generate_kwargs.num_beams,model,device,task,hub_kwargs.revision,hub_kwargs.cache_dir,hub_kwargs.force_download,hub_kwargs.local_files_only,environment.optimum_version,environment.optimum_commit,environment.transformers_version,environment.transformers_commit,environment.accelerate_version,environment.accelerate_commit,environment.diffusers_version,environment.diffusers_commit,environment.python_version,environment.system,environment.cpu,environment.cpu_count,environment.cpu_ram_mb,environment.gpus,forward.latency(s),forward.throughput(samples/s),forward.max_memory_used(MB),forward.max_memory_allocated(MB),forward.max_memory_reserved(MB),generate.latency(s),generate.throughput(tokens/s),generate.max_memory_used(MB),generate.max_memory_allocated(MB),generate.max_memory_reserved(MB)
fp16-batch_size(16)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,16,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.402,39.8,26889,19191,24591,17.3,474.0,79295,26441,76996
fp16-batch_size(8)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,8,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.202,39.6,26369,16372,24071,14.0,293.0,32302,19990,30003
gptq-batch_size(16)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,16,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.415,38.6,13652,9752,11353,24.6,333.0,83302,17001,81004
fp16-batch_size(4)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,4,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.106,37.7,25843,14962,23544,13.7,149.0,25843,16768,23544
gptq-batch_size(8)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,8,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.22,36.4,10252,6933,7954,19.8,207.0,44107,10556,41808
fp16-batch_size(2)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,2,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.0574,34.8,25840,14258,23542,13.7,74.7,25840,15160,23542
gptq-batch_size(4)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,4,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.121,33.1,8541,5524,6243,15.2,135.0,16399,7334,14101
fp16-batch_size(1)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,1,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.0325,30.8,25840,13906,23542,13.2,38.8,25840,14356,23542
gptq-batch_size(2)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,2,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.0703,28.4,7167,4818,4869,15.0,68.3,8728,5722,6429
gptq-batch_size(1)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,1,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,['NVIDIA A100-SXM4-80GB'],0.0457,21.9,6803,4467,4504,14.6,35.1,7599,4916,5301
fp16-batch_size(16)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,16,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda:0,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.402,39.8,19165,16520,17779,17.4,471.0,27988,26442,84511
fp16-batch_size(8)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,8,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.204,39.2,17087,15037,15701,14.1,290.0,64889,19997,63503
gptq-batch_size(16)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,16,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda:0,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.415,38.6,10900,7080,8604,24.6,333.0,65676,17002,83596
fp16-batch_size(4)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,4,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.107,37.4,16022,14295,14636,13.9,147.0,26346,16774,24960
gptq-batch_size(8)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,8,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.223,35.9,8826,5597,6530,19.9,206.0,56629,10557,54333
fp16-batch_size(2)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,2,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.0579,34.5,15392,13924,14006,13.6,75.3,17003,15162,15617
gptq-batch_size(4)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,4,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.122,32.8,7761,4855,5465,15.3,134.0,18085,7335,15789
fp16-batch_size(1)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,1,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,meta-llama/Llama-2-7b-hf,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.0328,30.5,15153,13738,13767,13.5,37.9,15866,14356,14480
gptq-batch_size(2)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,2,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.0706,28.3,6872,4484,4575,15.4,66.5,8822,5722,6526
gptq-batch_size(1)-sequence_length(256)-new_tokens(512),pytorch,2.1.0+cu118,optimum_benchmark.backends.pytorch.backend.PyTorchBackend,42,,,False,False,False,False,,float16,True,True,False,,False,False,,False,,inference,optimum_benchmark.benchmarks.inference.benchmark.InferenceBenchmark,10,10,True,False,1,256,1,80,3000,16000,512,False,True,512,512,False,True,0,1,TheBloke/Llama-2-7B-GPTQ,cuda,text-generation,main,,False,False,1.13.2,,4.34.1,,0.24.1,,,,3.10.12,Linux, AMD EPYC 7742 64-Core Processor,128,540684,"['NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB', 'NVIDIA A100-SXM4-80GB']",0.0458,21.8,6746,4298,4450,14.8,34.6,7606,4916,5309
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit e31cd4c

Please sign in to comment.