CUDA: add FP32 FlashAttention vector kernel #100
Job | Run time |
---|---|
2m 28s | |
2m 8s | |
2m 27s | |
2m 1s | |
2m 31s | |
1m 37s | |
5m 50s | |
3m 3s | |
5m 44s | |
2m 29s | |
1m 47s | |
14s | |
2m 1s | |
14s | |
15s | |
1m 22s | |
21m 45s | |
16s | |
5m 50s | |
12m 6s | |
5m 54s | |
14m 40s | |
1s | |
6m 14s | |
21m 9s | |
5m 10s | |
8m 35s | |
6m 22s | |
5m 32s | |
7m 57s | |
6m 14s | |
5m 19s | |
6m 9s | |
0s | |
2h 55m 24s |