Support BF16 kvcache, rope and attentions for inference of GGUF/GGML models #4387
Job | Run time |
---|---|
2m 3s | |
1m 43s | |
14s | |
6m 18s | |
2m 51s | |
1m 32s | |
5m 51s | |
4s | |
9m 16s | |
3m 43s | |
33m 35s |
Job | Run time |
---|---|
2m 3s | |
1m 43s | |
14s | |
6m 18s | |
2m 51s | |
1m 32s | |
5m 51s | |
4s | |
9m 16s | |
3m 43s | |
33m 35s |