[How-to]How to get Flash-Attention under windows 11 CUDA #1469

mytait · 2025-01-30T13:05:02Z

Here is a guide on how to get Flash attention to work under windows. By either downloading a compiled file or compiling yourself.
Its not hard but if you are fully new here the infos are not in a central point.

I needed this under windows and the "pip install flash-attn (--no-build-isolation)" does not work.. you get half an hour of things until it crashes due to either not finding torch (which is installed) or some other causes.. There are a couple of threads but they describe old hacks that are not needed (modifying files).

First of all: instead of compiling yourself (which takes more than 2 hours on my 64GB 12 core machine for a full compile) try downloading a precompiled lib from here. i can also confirm they work:

https://github.com/bdashore3/flash-attention/releases

HOW TO COMPILE

I can confirm this works on my machine with the latest code as of JAN2025

If you need the latest version (curently 2.7.3) or a python version not included above read on..

You should have CUDA toolkit installed and working.

you will need c++ compiler. if you dont have Visualc++ already installed. run this in a administrator console. these commands install C++ compiler silently Still they are a couple of GB big.:

Windows 11 SDK ms build to compile C++ Libraries.

winget install --id=Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows11SDK.22621" -e --silent --accept-package-agreements --accept-source-agreements

Alternatively use this on windows 10
winget install --id=Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows10SDK" -e --silent --accept-package-agreements --accept-source-agreements

now create a python virtual environment (for the python version of your project!. the resulting lib will be tied to that python version) and activate (this depends on you.. so i wont give that one command)

inside the environment install these libraries (these

Create a requirements.txt file with the following content:

--extra-index-url=https://download.pytorch.org/whl/cu124
torch
einops
psutil
wheel
packaging
ninja

Those lines install torch from the pytorch repo with support for CUDA 12. if you need another version you can get the repo ID on the pytorch site.

install that file as usual with:

pip install -r requirements.txt

clone the flash attention repository in the environment and run this in the new cloned directory in administrator mode (else you get Filename too long errors even when developer mode is on):

python setup.py install(this takes about 2 hours) during which all CPU cores were at max usage and also the RAM was under heavy load.

python setup.py bdist_wheel(this takes about 1 minute)

you will get a whl file in

\flash-attention\dist

The resulting whl file can be installed on your target projects environment with

pip install [path to wheelfilename].whl

Hope this helps

The text was updated successfully, but these errors were encountered:

werruww · 2025-01-31T03:33:26Z

https://huggingface.co/lldacing/flash-attention-windows-wheel/discussions/2

werruww · 2025-01-31T03:36:53Z

(base) C:\Windows\system32>conda activate my10

(my10) C:\Windows\system32>cd C:\Users\TARGET STORE\Desktop\1\flash-attention

(my10) C:\Users\TARGET STORE\Desktop\1\flash-attention>pip install flash_attn-2.7.1.post1+cu124torch2.4.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
Processing c:\users\target store\desktop\1\flash-attention\flash_attn-2.7.1.post1+cu124torch2.4.0cxx11abifalse-cp310-cp310-win_amd64.whl
Requirement already satisfied: torch in c:\programdata\anaconda3\envs\my10\lib\site-packages (from flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (2.1.0+cu121)
Requirement already satisfied: einops in c:\programdata\anaconda3\envs\my10\lib\site-packages (from flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (0.8.0)
Requirement already satisfied: filelock in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (3.13.1)
Requirement already satisfied: typing-extensions in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (4.12.2)
Requirement already satisfied: sympy in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (1.13.1)
Requirement already satisfied: networkx in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (3.3)
Requirement already satisfied: jinja2 in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (3.1.4)
Requirement already satisfied: fsspec in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (2024.6.1)
Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\envs\my10\lib\site-packages (from jinja2->torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (2.1.3)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\programdata\anaconda3\envs\my10\lib\site-packages (from sympy->torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (1.3.0)
Installing collected packages: flash-attn
Attempting uninstall: flash-attn
Found existing installation: flash-attn 2.3.2
Uninstalling flash-attn-2.3.2:
Successfully uninstalled flash-attn-2.3.2
Successfully installed flash-attn-2.7.1.post1

(my10) C:\Users\TARGET STORE\Desktop\1\flash-attention>python
Python 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import flash-attn-
File "", line 1
import flash-attn-
^
SyntaxError: invalid syntax
import flash-attn
File "", line 1
import flash-attn
^
SyntaxError: invalid syntax

import flash-attn
File "", line 1
import flash-attn
^
SyntaxError: invalid syntax

from flash_attn import flash_attn_qkvpacked_func, flash_attn_func
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\TARGET STORE\Desktop\1\flash-attention\flash_attn_init_.py", line 3, in
from flash_attn.flash_attn_interface import (
File "C:\Users\TARGET STORE\Desktop\1\flash-attention\flash_attn\flash_attn_interface.py", line 5, in
import torch
File "C:\ProgramData\anaconda3\envs\my10\lib\site-packages\torch_init_.py", line 137, in
raise err
OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\ProgramData\anaconda3\envs\my10\lib\site-packages\torch\lib\nvfuser_codegen.dll" or one of its dependencies.

nightrun9 · 2025-01-31T21:05:34Z

Thanks, compiled successfully with torch 2.6.0 cu124.

Refer to the installation guide of triton-windows for related compiler issues.

anunknowperson · 2025-02-01T06:26:11Z

(ve) C:\Users\Admin\Desktop\bert\flash-attention>python setup.py install
Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'csrc/composable_kernel'
Cloning into 'C:/Users/Admin/Desktop/bert/flash-attention/csrc/composable_kernel'...
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_kpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_mnkpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_mnpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v1_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v1_kpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v1_mnkpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_kpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_mnkpadding_instance.cpp: Filename too long

alex13by · 2025-02-02T01:26:22Z

Did you compile successfully with torch 2.6.0 cu126?

mytait · 2025-02-03T09:09:44Z

device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_mnkpadding_instance.cpp: ***Filename too long***

You are compiling for ROCm, which i dont have experience with.. still:

you neet to run the install command in a cmd windows in admin mode else you get "filename too long" errors.

Did you compile successfully with torch 2.6.0 cu126?

I compiled torch 2.6.0 cu124.

@werruww you are using some binary that is not what i linked. try the ones i linked to.

mytait changed the title ~~How to get Flash-Attention under windows 11 CUDA~~ [How-to]How to get Flash-Attention under windows 11 CUDA Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[How-to]How to get Flash-Attention under windows 11 CUDA #1469

[How-to]How to get Flash-Attention under windows 11 CUDA #1469

mytait commented Jan 30, 2025 •

edited

Loading

werruww commented Jan 31, 2025

werruww commented Jan 31, 2025

nightrun9 commented Jan 31, 2025

anunknowperson commented Feb 1, 2025

alex13by commented Feb 2, 2025

mytait commented Feb 3, 2025 •

edited

Loading

[How-to]How to get Flash-Attention under windows 11 CUDA #1469

[How-to]How to get Flash-Attention under windows 11 CUDA #1469

Comments

mytait commented Jan 30, 2025 • edited Loading

werruww commented Jan 31, 2025

werruww commented Jan 31, 2025

nightrun9 commented Jan 31, 2025

anunknowperson commented Feb 1, 2025

alex13by commented Feb 2, 2025

mytait commented Feb 3, 2025 • edited Loading

mytait commented Jan 30, 2025 •

edited

Loading

mytait commented Feb 3, 2025 •

edited

Loading