Compare ring-flash-attention & ring-attention-pytorch #11

andreaskoepf · 2024-02-28T21:37:55Z

lucidrains & zhuzilin were hard working the last days and have completed the following two ring-attention implementations:

Create a test setup that verifies correctness and compares the performance of both solutions.

Phil decided to use a custom triton kernel. Find out why this kernel is used and if it is indeed faster than the cuda flash-attention 2.

Please generate a little report of your findings, either as markdown file or ipynb.

andreaskoepf · 2024-02-29T07:41:44Z

Phil mentioned there is still an issue with the backward pass which we could look into and potentially fix.

Iron-Bound · 2024-02-29T11:30:25Z

torchrun --nproc_per_node 2 te

st/test_zigzag_ring_flash_attn_func.py
##############################

forward:

##############################
out: max 3.046875, mean 0.04052734375
lse: max 9.003575325012207, mean 7.748651027679443
out diff:
[0] max 0.0009765625, mean 2.372264862060547e-05
[1] max 0.001953125, mean 6.031990051269531e-05
lse diff:
[0] max 1.9073486328125e-06, mean 1.4424823291392386e-07
[1] max 1.9073486328125e-06, mean 2.30388167210549e-07
##############################

backward:

##############################
load_dq:
[0] max 2.828125, mean 0.04736328125
[1] max 0.3828125, mean 0.0308837890625
dq diff:
[0] max 0.001953125, mean 2.002716064453125e-05
[1] max 0.001953125, mean 5.555152893066406e-05
load_dk:
[0] max 2.578125, mean 0.04052734375
[1] max 0.31640625, mean 0.02197265625
dk0 diff:
[0] max 0.015625, mean 6.103515625e-05
[1] max 0.001953125, mean 3.814697265625e-05
load_dv:
[0] max 4.84375, mean 0.042236328125
[1] max 0.287109375, mean 0.02197265625
dv diff:
[0] max 0.03125, mean 5.984306335449219e-05
[1] max 0.0009765625, mean 3.528594970703125e-05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare ring-flash-attention & ring-attention-pytorch #11

Compare ring-flash-attention & ring-attention-pytorch #11

andreaskoepf commented Feb 28, 2024 •

edited

Loading

andreaskoepf commented Feb 29, 2024

Iron-Bound commented Feb 29, 2024

Compare ring-flash-attention & ring-attention-pytorch #11

Compare ring-flash-attention & ring-attention-pytorch #11

Comments

andreaskoepf commented Feb 28, 2024 • edited Loading

andreaskoepf commented Feb 29, 2024

Iron-Bound commented Feb 29, 2024

torchrun --nproc_per_node 2 te

forward:

backward:

andreaskoepf commented Feb 28, 2024 •

edited

Loading