forked from Dao-AILab/flash-attention
-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fp8 BWD after figuring out varlen problem
This is a combination of 21 commits. fp8 BWD Enable BWD fp8 with split kernel Enable BWD fp8 with per block scale factors for p and ds This is a combination of 9 commits. Enable BWD fp8 This is a combination of 12 commits. add backward test case save clean up disable ci lse is good dv matches reduce diff use do fp8 for dv kinda working group size is a constexpr clean up a bit everything except mqa/gqa works skip mqa cases 20 cases have nan on dropout save what you have disable tests failing enable tests per block descale_p and descale_ds use max(abs(()) clean up tests a bit more fix bug disable ci for now pass variables add flags add alternate path. Still need to load descale factors dv working dk works save add type info for backward fix DEBUG flag bug fix bug with backward. Normal forward works with dropout. Segfault with causal. Varlen has some issues. Might be related to strides. pass descale strides test causal fix causal compiler assert. min head should be 32 remove descale_p save explict name as causal isolate bad case just run fp8 tests bench with autotune min changes cast_fp8 helper cast_varlen_to_fp8 save minor highlight failing configs increase test cases mark failing recategorize misc tests group failing gqa configs add more tests add vis code min ci changes dump folder single image per tensors add tensor comparison gen varlen tensor vis varlen tensors varlen diff nice varlen vis vis function show seqlen in varlen add vis_tensors function simplify add color bars rm vis from test set canvas size. descale values are optional add ck tests add flag to build ck rm ck test assert requires grad ensure q, k, and v require gradients split vis rm interp, 8k and 300 dpi slice per page disable ci for now add more vis code tensor per image is better for vis_close, don't vis if no error. also vis all failing varlen tests varlen failures due to different seqlens rm vis code
- Loading branch information
1 parent
37dd11e
commit 567c282
Showing
10 changed files
with
945 additions
and
462 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.