Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 550 Bytes

README.md

File metadata and controls

17 lines (10 loc) · 550 Bytes

CUTLASS Kernels

Library of CUTLASS kernels targeting Large Language Models (LLM).

Building

  1. Download CUTLASS following instructions from: https://github.com/NVIDIA/cutlass.
  2. Modify the (hardcoded) path in the sample compile.sh to your CUTLASS directory.
  3. Run the modified compile.sh as ./compile.sh.

Running

  1. While running the executable make sure to set NVIDIA_TF32_OVERRIDE=1 to enable TF32 mode for cuBLAS for SGEMM. Otherwise, cuBLAS uses float32.

Notes

  1. See README.md in sub-directories for more specific instructions.