Slowdown on processing prompt upgrading from 1.78 to 1.79 #1266

Tedris · 2024-12-14T20:57:00Z

I ran a couple of benchmarks when I noticed slowdown between the two versions:

Running GeForce 4070 Super 12 GB with 32 GB RAM

1.78

Processing Prompt [BLAS] (16284 / 16284 tokens)
Generating (100 / 100 tokens)
CtxLimit:16384/16384, Amt:100/100, Init:0.07s, Process:21.22s (1.3ms/T = 767.28T/s), Generate:23.24s (232.4ms/T = 4.30T/s), Total:44.46s (2.25T/s)
Benchmark Completed - v1.78 Results:
======
Flags: NoAVX2=False Threads=8 HighPriority=False Cublas_Args=['lowvram', '0', 'mmq'] Tensor_Split=None BlasThreads=16 BlasBatchSize=512 FlashAttention=True KvCache=0
Timestamp: 2024-12-14 20:41:03.368064+00:00
Backend: koboldcpp_cublas.dll
Layers: 59
Model: Cydonia-22B-v1.3.i1-IQ4_XS
MaxCtx: 16384
GenAmount: 100
-----
ProcessingTime: 21.223s
ProcessingSpeed: 767.28T/s
GenerationTime: 23.240s
GenerationSpeed: 4.30T/s
TotalTime: 44.463s
Output:  1 1 1 1
-----

1.78

Processing Prompt [BLAS] (16284 / 16284 tokens)
Generating (100 / 100 tokens)
[11:31:34] CtxLimit:16384/16384, Amt:100/100, Init:0.06s, Process:27.73s (1.7ms/T = 587.23T/s), Generate:38.02s (380.2ms/T = 2.63T/s), Total:65.75s (1.52T/s)
Benchmark Completed - v1.79.1 Results:
======
Flags: NoAVX2=False Threads=8 HighPriority=False Cublas_Args=['lowvram', '0', 'mmq'] Tensor_Split=None BlasThreads=16 BlasBatchSize=512 FlashAttention=True KvCache=0
Timestamp: 2024-12-14 16:31:34.073011+00:00
Backend: koboldcpp_cublas.dll
Layers: 59
Model: Cydonia-22B-v1.3.i1-IQ4_XS
MaxCtx: 16384
GenAmount: 100
-----
ProcessingTime: 27.730s
ProcessingSpeed: 587.23T/s
GenerationTime: 38.024s
GenerationSpeed: 2.63T/s
TotalTime: 65.754s
Output:  1 1 1 1
-----

Notice a 21 second increase in total time taken from 1.78 to 1.79

The text was updated successfully, but these errors were encountered:

3750gustavo · 2024-12-15T15:18:13Z

I tested here and also found a tiny increase on 1.78, but in my case it was miniscule, 1.79 t/s on 1.78v versus 1.76 t/s on version 1.79

my system is kinda similar, just less Vram: 8gb vram 3070 ti and 32gb ram

LostRuins · 2024-12-20T06:04:15Z

How about v1.80?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown on processing prompt upgrading from 1.78 to 1.79 #1266

Slowdown on processing prompt upgrading from 1.78 to 1.79 #1266

Tedris commented Dec 14, 2024 •

edited

Loading

3750gustavo commented Dec 15, 2024

LostRuins commented Dec 20, 2024

Slowdown on processing prompt upgrading from 1.78 to 1.79 #1266

Slowdown on processing prompt upgrading from 1.78 to 1.79 #1266

Comments

Tedris commented Dec 14, 2024 • edited Loading

3750gustavo commented Dec 15, 2024

LostRuins commented Dec 20, 2024

Tedris commented Dec 14, 2024 •

edited

Loading