-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triton flash atten module generates wrong results #514
Comments
I tried your code on MI250 with rocm5.6 and |
@zhanglx13 thanks for your reply, I will try the latest one |
@zhanglx13 I update the latest on ,the results seems ok, Additionally, if I integrate Triton Flash Attention with VLLM, which Triton FA code should I use? |
Let's ask @vgokhale for help |
You can use this one. Thanks! |
@vgokhale thanks for your reply,I just tested this code(python/perf-kernels/flash-attention.py) and it generates core dumped
|
This is a bit weird - it completes the benchmark and core dumps somewhere after. These numbers are also pretty low, even for MI210. Few points:
|
@vgokhale thanks for your reply, I will update the ROCM Version and try, and Is there a Flash-Decoding algorithm implemented based on Triton? |
Hi @zhangxiao-stack, flash-attention.py should now run without the core dump you faced above. Can you try with triton-mlir latest? Re. flash-decoding, we are working on this. We expect a first version this week. |
Hi, @vgokhale
|
1 similar comment
Hi, @vgokhale
|
Hmm, these look expected on a MI210. What baseline are you comparing with to compare these as low? |
Hi @vgokhale . Sorry for the late reply, I got it wrong
Does this result look ok? |
What script are you using for flash decoding? I don't think we have one checked in at top of triton-mlir branch yet. |
@vgokhale hi, python/perf-kernels/06-attention-decode.py generates wrong results now
|
Hi @scxiao, since you have sent a PR, I imagine it passes all unit tests? |
@vgokhale hi, After making modifications based on the pull request #541
2、Official python/perf-kernels/flash-attention.py(unit test: test_op_bwd) fails at the (1, 16, 8192, 64) with memory access fault .
|
For FA bwd kernel, please use python/tutorials/06-fused-attention.py. We are currently working on supporting bwd in the perf-kernels folder - until then the tutorials folder is the right one to use for bwd. |
Closing due to no new updates. |
Problem Description
Hi,
Can somebody please take a look a this ?
I just tested this code and it generates wrong results.
Operating System
NAME="Ubuntu" VERSION="20.04.5 LTS (Focal Fossa)"
CPU
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
GPU
AMD Instinct MI210
ROCm Version
ROCm 5.5.0
ROCm Component
No response
Steps to Reproduce
Steps
errors
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
No response
The text was updated successfully, but these errors were encountered: