Skip to content

tterrysun/cutlass-kernels

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUTLASS Kernels

Library of CUTLASS kernels targeting Large Language Models (LLM).

Building

  1. Download CUTLASS following instructions from: https://github.com/NVIDIA/cutlass.
  2. Modify the (hardcoded) path in the sample compile.sh to your CUTLASS directory.
  3. Run the modified compile.sh as ./compile.sh.

Running

  1. While running the executable make sure to set NVIDIA_TF32_OVERRIDE=1 to enable TF32 mode for cuBLAS for SGEMM. Otherwise, cuBLAS uses float32.

Notes

  1. See README.md in sub-directories for more specific instructions.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 48.6%
  • C++ 46.2%
  • Shell 5.2%