Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[How-to]How to get Flash-Attention under windows 11 CUDA #1469

Open
mytait opened this issue Jan 30, 2025 · 6 comments
Open

[How-to]How to get Flash-Attention under windows 11 CUDA #1469

mytait opened this issue Jan 30, 2025 · 6 comments

Comments

@mytait
Copy link

mytait commented Jan 30, 2025

Here is a guide on how to get Flash attention to work under windows. By either downloading a compiled file or compiling yourself.
Its not hard but if you are fully new here the infos are not in a central point.

I needed this under windows and the "pip install flash-attn (--no-build-isolation)" does not work.. you get half an hour of things until it crashes due to either not finding torch (which is installed) or some other causes.. There are a couple of threads but they describe old hacks that are not needed (modifying files).

First of all: instead of compiling yourself (which takes more than 2 hours on my 64GB 12 core machine for a full compile) try downloading a precompiled lib from here. i can also confirm they work:

https://github.com/bdashore3/flash-attention/releases

HOW TO COMPILE

I can confirm this works on my machine with the latest code as of JAN2025

If you need the latest version (curently 2.7.3) or a python version not included above read on..

You should have CUDA toolkit installed and working.

you will need c++ compiler. if you dont have Visualc++ already installed. run this in a administrator console. these commands install C++ compiler silently Still they are a couple of GB big.:

Windows 11 SDK ms build to compile C++ Libraries.

winget install --id=Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows11SDK.22621" -e --silent --accept-package-agreements --accept-source-agreements

Alternatively use this on windows 10
winget install --id=Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows10SDK" -e --silent --accept-package-agreements --accept-source-agreements

now create a python virtual environment (for the python version of your project!. the resulting lib will be tied to that python version) and activate (this depends on you.. so i wont give that one command)

inside the environment install these libraries (these

Create a requirements.txt file with the following content:

--extra-index-url=https://download.pytorch.org/whl/cu124
torch
einops
psutil
wheel
packaging
ninja

Those lines install torch from the pytorch repo with support for CUDA 12. if you need another version you can get the repo ID on the pytorch site.

install that file as usual with:

pip install -r requirements.txt

clone the flash attention repository in the environment and run this in the new cloned directory in administrator mode (else you get Filename too long errors even when developer mode is on):

python setup.py install(this takes about 2 hours) during which all CPU cores were at max usage and also the RAM was under heavy load.

python setup.py bdist_wheel(this takes about 1 minute)

you will get a whl file in

\flash-attention\dist

The resulting whl file can be installed on your target projects environment with

pip install [path to wheelfilename].whl

Hope this helps

@mytait mytait changed the title How to get Flash-Attention under windows 11 CUDA [How-to]How to get Flash-Attention under windows 11 CUDA Jan 30, 2025
@werruww
Copy link

werruww commented Jan 31, 2025

@werruww
Copy link

werruww commented Jan 31, 2025

(base) C:\Windows\system32>conda activate my10

(my10) C:\Windows\system32>cd C:\Users\TARGET STORE\Desktop\1\flash-attention

(my10) C:\Users\TARGET STORE\Desktop\1\flash-attention>pip install flash_attn-2.7.1.post1+cu124torch2.4.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
Processing c:\users\target store\desktop\1\flash-attention\flash_attn-2.7.1.post1+cu124torch2.4.0cxx11abifalse-cp310-cp310-win_amd64.whl
Requirement already satisfied: torch in c:\programdata\anaconda3\envs\my10\lib\site-packages (from flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (2.1.0+cu121)
Requirement already satisfied: einops in c:\programdata\anaconda3\envs\my10\lib\site-packages (from flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (0.8.0)
Requirement already satisfied: filelock in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (3.13.1)
Requirement already satisfied: typing-extensions in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (4.12.2)
Requirement already satisfied: sympy in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (1.13.1)
Requirement already satisfied: networkx in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (3.3)
Requirement already satisfied: jinja2 in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (3.1.4)
Requirement already satisfied: fsspec in c:\programdata\anaconda3\envs\my10\lib\site-packages (from torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (2024.6.1)
Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\envs\my10\lib\site-packages (from jinja2->torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (2.1.3)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\programdata\anaconda3\envs\my10\lib\site-packages (from sympy->torch->flash-attn==2.7.1.post1+cu124torch2.4.0cxx11abiFALSE) (1.3.0)
Installing collected packages: flash-attn
Attempting uninstall: flash-attn
Found existing installation: flash-attn 2.3.2
Uninstalling flash-attn-2.3.2:
Successfully uninstalled flash-attn-2.3.2
Successfully installed flash-attn-2.7.1.post1

(my10) C:\Users\TARGET STORE\Desktop\1\flash-attention>python
Python 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import flash-attn-
File "", line 1
import flash-attn-
^
SyntaxError: invalid syntax
import flash-attn
File "", line 1
import flash-attn
^
SyntaxError: invalid syntax

import flash-attn
File "", line 1
import flash-attn
^
SyntaxError: invalid syntax

from flash_attn import flash_attn_qkvpacked_func, flash_attn_func
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\TARGET STORE\Desktop\1\flash-attention\flash_attn_init_.py", line 3, in
from flash_attn.flash_attn_interface import (
File "C:\Users\TARGET STORE\Desktop\1\flash-attention\flash_attn\flash_attn_interface.py", line 5, in
import torch
File "C:\ProgramData\anaconda3\envs\my10\lib\site-packages\torch_init_.py", line 137, in
raise err
OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\ProgramData\anaconda3\envs\my10\lib\site-packages\torch\lib\nvfuser_codegen.dll" or one of its dependencies.

@nightrun9
Copy link

Thanks, compiled successfully with torch 2.6.0 cu124.

Refer to the installation guide of triton-windows for related compiler issues.

@anunknowperson
Copy link

(ve) C:\Users\Admin\Desktop\bert\flash-attention>python setup.py install
Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'csrc/composable_kernel'
Cloning into 'C:/Users/Admin/Desktop/bert/flash-attention/csrc/composable_kernel'...
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_kpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_mnkpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_comp_mnpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v1_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v1_kpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v1_mnkpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_default_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_kpadding_instance.cpp: Filename too long
error: unable to create file library/src/tensor_operation_instance/gpu/gemm_universal_streamk/device_gemm_xdl_universal_streamk_bf16_bf16_bf16/device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_mnkpadding_instance.cpp: Filename too long

@alex13by
Copy link

alex13by commented Feb 2, 2025

Did you compile successfully with torch 2.6.0 cu126?

@mytait
Copy link
Author

mytait commented Feb 3, 2025

device_gemm_xdl_universal_streamk_bf16_bf16_bf16_km_kn_mn_mem_v2_mnkpadding_instance.cpp: ***Filename too long***

You are compiling for ROCm, which i dont have experience with.. still:

you neet to run the install command in a cmd windows in admin mode else you get "filename too long" errors.

Did you compile successfully with torch 2.6.0 cu126?

I compiled torch 2.6.0 cu124.

@werruww you are using some binary that is not what i linked. try the ones i linked to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants