Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEMM codegen] Add e2e pipeline #316

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

Conversation

Xinyu302
Copy link

@Xinyu302 Xinyu302 commented Jun 7, 2024

Do followering things:

  1. gpu-to-vector and nvvm lowering(See GPUVectorToGPU.cpp and NVVMCodegen.cpp)
  2. handle dynamic shared memory. By modify GPU::launch(See GPUToNVVM.cpp and LegalizeGPULaunch.cpp and FuncToByre.cpp)
  3. GPU pipelining using upstream code(Not best way, we can add ldmatrix prefetch here)
  4. Debug GPUPackSharedMemory. As we use predicted cp.async, before store to C, we should also wait the last several copy zero length to sharedA and sharedB. (See GPUPackSharedMemory.cpp)
  5. All pass use memref.alloc, because upstream's passes don't handle memref.alloca
  6. add f32 codegen support (using mma.sync.aligned.m16n8k8.row.col.f32.tf32.tf32.f32), but has limited precision.(TODO: support TF32_3x in upstream populateMmaSyncF32ToTF32Patterns)
  7. rename getForallOpMappedTo2DBlock to getForallOpMappedToBlock as we have supported bmm.
  8. Build the lowering pipeling in linalg-tensor-opt(annotate lowering config and tiling) and linalg-memref-opt(mainly optimize gemm in here)
  9. Add barrier in promotion.(See LinalgPromotion.cpp)
  10. Add outlineDotOp option in byteir.compile, to control whether to use gemm codegen. And we outline single dot op to trigger codegen.
  11. fix bug in GPUBlockSwizzle.cpp and support bmm swizzle
  12. Now we have a hack way to modify linalg.fill + linalg.matmul + linalg.generic(epilogue) pattern on tensor level. Support epilogue fusion.

@Xinyu302 Xinyu302 marked this pull request as draft June 7, 2024 09:46
delete multibufferext

delete gpu pipeling
@Xinyu302 Xinyu302 marked this pull request as ready for review July 2, 2024 19:19
@Xinyu302 Xinyu302 changed the title [WIP][GEMM codegen] Add e2e pipeline [GEMM codegen] Add e2e pipeline Jul 2, 2024
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Xinyu302
❌ yangxinyu


yangxinyu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants