We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I ran a couple of benchmarks when I noticed slowdown between the two versions:
Running GeForce 4070 Super 12 GB with 32 GB RAM
1.78
Processing Prompt [BLAS] (16284 / 16284 tokens) Generating (100 / 100 tokens) CtxLimit:16384/16384, Amt:100/100, Init:0.07s, Process:21.22s (1.3ms/T = 767.28T/s), Generate:23.24s (232.4ms/T = 4.30T/s), Total:44.46s (2.25T/s) Benchmark Completed - v1.78 Results: ====== Flags: NoAVX2=False Threads=8 HighPriority=False Cublas_Args=['lowvram', '0', 'mmq'] Tensor_Split=None BlasThreads=16 BlasBatchSize=512 FlashAttention=True KvCache=0 Timestamp: 2024-12-14 20:41:03.368064+00:00 Backend: koboldcpp_cublas.dll Layers: 59 Model: Cydonia-22B-v1.3.i1-IQ4_XS MaxCtx: 16384 GenAmount: 100 ----- ProcessingTime: 21.223s ProcessingSpeed: 767.28T/s GenerationTime: 23.240s GenerationSpeed: 4.30T/s TotalTime: 44.463s Output: 1 1 1 1 -----
Processing Prompt [BLAS] (16284 / 16284 tokens) Generating (100 / 100 tokens) [11:31:34] CtxLimit:16384/16384, Amt:100/100, Init:0.06s, Process:27.73s (1.7ms/T = 587.23T/s), Generate:38.02s (380.2ms/T = 2.63T/s), Total:65.75s (1.52T/s) Benchmark Completed - v1.79.1 Results: ====== Flags: NoAVX2=False Threads=8 HighPriority=False Cublas_Args=['lowvram', '0', 'mmq'] Tensor_Split=None BlasThreads=16 BlasBatchSize=512 FlashAttention=True KvCache=0 Timestamp: 2024-12-14 16:31:34.073011+00:00 Backend: koboldcpp_cublas.dll Layers: 59 Model: Cydonia-22B-v1.3.i1-IQ4_XS MaxCtx: 16384 GenAmount: 100 ----- ProcessingTime: 27.730s ProcessingSpeed: 587.23T/s GenerationTime: 38.024s GenerationSpeed: 2.63T/s TotalTime: 65.754s Output: 1 1 1 1 -----
Notice a 21 second increase in total time taken from 1.78 to 1.79
The text was updated successfully, but these errors were encountered:
I tested here and also found a tiny increase on 1.78, but in my case it was miniscule, 1.79 t/s on 1.78v versus 1.76 t/s on version 1.79
my system is kinda similar, just less Vram: 8gb vram 3070 ti and 32gb ram
Sorry, something went wrong.
How about v1.80?
No branches or pull requests
I ran a couple of benchmarks when I noticed slowdown between the two versions:
Running GeForce 4070 Super 12 GB with 32 GB RAM
1.78
1.78
Notice a 21 second increase in total time taken from 1.78 to 1.79
The text was updated successfully, but these errors were encountered: