Skip to content

CUDA: add FP32 FlashAttention vector kernel #100

CUDA: add FP32 FlashAttention vector kernel

CUDA: add FP32 FlashAttention vector kernel #100

Job Run time
2m 28s
2m 8s
2m 27s
2m 1s
2m 31s
1m 37s
5m 50s
3m 3s
5m 44s
2m 29s
1m 47s
14s
2m 1s
14s
15s
1m 22s
21m 45s
16s
5m 50s
12m 6s
5m 54s
14m 40s
1s
6m 14s
21m 9s
5m 10s
8m 35s
6m 22s
5m 32s
7m 57s
6m 14s
5m 19s
6m 9s
0s
2h 55m 24s