31 Jan 20:42

v0.0.29.post2

1298453

`v0.0.29.post2` - build for PyTorch 2.6.0 Latest

Latest

Pre-built binary wheels are available for PyTorch 2.6.0. Following PyTorch, we build wheels for CUDA 11.8, 12.4, and 12.6 only (we no longer build for CUDA 12.1).
xFormers now requires PyTorch >= 2.6

Assets 2

31 Dec 10:10

danthe3rd

v0.0.29.post1

46a02df

[v0.0.29.post1] Fix Flash2 on windows

This fixes the issue reported in #1163 (comment)

Assets 2

27 Dec 09:39

danthe3rd

v0.0.29

56be3b5

Enabling FAv3 by default, removed deprecated components

Pre-built binary wheels require PyTorch 2.5.1

Improved:

[fMHA] Creating a LowerTriangularMask no longer creates a CUDA tensor
[fMHA] Updated Flash-Attention to v2.7.2.post1
[fMHA] Flash-Attention v3 will now be used by memory_efficient_attention by default when available, unless the operator is enforced with the op keyword-argument. Switching from Flash2 to Flash3 can make transformer trainings ~10% faster end-to-end on H100s
[fMHA] Fixed a performance regression with the cutlass backend for the backward pass (#1176) - mostly used on older GPUs (eg V100)
Fixed swiglu operator compatibility with torch-compile with PyTorch 2.6
Fix activation checkpointing of SwiGLU when AMP is enabled (#1152)

Removed:

Following PyTorch, xFormers no longer builds binaries for conda. Pip is now the only recommended way to get xFormers
Removed unmaintained/deprecated components in xformers.components.* (see #848)

Assets 2

30 Oct 17:56

v0.0.28.post3

f3bc7a7

`v0.0.28.post3` - build for PyTorch 2.5.1

[0.0.28.post3] - 2024-10-30

Pre-built binary wheels require PyTorch 2.5.1

Assets 2

22 Oct 11:13

v0.0.28.post2

68b7fd1

`v0.0.28.post2` - build for PyTorch 2.5.0

[0.0.28.post2] - 2024-10-18

Pre-built binary wheels require PyTorch 2.5.0

Assets 2

13 Sep 15:52

danthe3rd

v0.0.28.post1

d3948b5

`0.0.28.post1` - fixing upload for cuda 12.4 wheels

[0.0.28.post1] - 2024-09-13

Properly upload wheels for cuda 12.4

Assets 2

12 Sep 15:49

danthe3rd

v0.0.28

c909f0d

FAv3, profiler update & AMD

Pre-built binary wheels require PyTorch 2.4.1

Added

Added wheels for cuda 12.4
Added conda builds for python 3.11
Added wheels for rocm 6.1

Improved

Profiler: Fix computation of FLOPS for the attention when using xFormers
Profiler: Fix MFU/HFU calculation when multiple dtypes are used
Profiler: Trace analysis to compute MFU & HFU is now much faster
fMHA/splitK: Fixed nan in the output when using a torch.Tensor bias where a lot of consecutive keys are masked with -inf
Update Flash-Attention version to v2.6.3 when building from scratch
When using the most recent version of Flash-Attention, it is no longer possible to mix it with the cutlass backend. In other words, it is no longer possible to use the cutlass Fw with the flash Bw.

Removed

fMHA: Removed decoder and small_k backends
profiler: Removed DetectSlowOpsProfiler profiler
Removed compatibility with PyTorch < 2.4
Removed conda builds for python 3.9
Removed windows pip wheels for cuda 12.1 and 11.8

Assets 2

26 Jul 15:41

v0.0.27.post2

1fc661f

torch.compile support, bug fixes & more

Pre-built binary wheels require PyTorch 2.4.0

Added

fMHA: PagedBlockDiagonalGappyKeysMask
fMHA: heterogeneous queries in triton_splitk
fMHA: support for paged attention in flash
fMHA: Added backwards pass for merge_attentions
fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

fMHA: Fixed out-of-bounds reading for Split-K triton implementation
Profiler: fix bug with modules that take a single tuple as argument
Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

Removed support for PyTorch version older than 2.2.0

Assets 2

25 Jul 11:59

v0.0.27.post1

b3831ea

torch.compile support, bug fixes & more

Pre-built binary wheels require PyTorch 2.4.0

Added

fMHA: PagedBlockDiagonalGappyKeysMask
fMHA: heterogeneous queries in triton_splitk
fMHA: support for paged attention in flash
fMHA: Added backwards pass for merge_attentions
fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

fMHA: Fixed out-of-bounds reading for Split-K triton implementation
Profiler: fix bug with modules that take a single tuple as argument
Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

Removed support for PyTorch version older than 2.2.0

Assets 2

09 Jul 16:35

danthe3rd

v0.0.27

184b280

[v0.0.27] torch.compile support, bug fixes & more

Added

fMHA: PagedBlockDiagonalGappyKeysMask
fMHA: heterogeneous queries in triton_splitk
fMHA: support for paged attention in flash
fMHA: Added backwards pass for merge_attentions
fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

fMHA: Fixed out-of-bounds reading for Split-K triton implementation
Profiler: fix bug with modules that take a single tuple as argument
Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

Removed support for PyTorch version older than 2.2.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved:

Removed:

[0.0.28.post3] - 2024-10-30

[0.0.28.post2] - 2024-10-18

[0.0.28.post1] - 2024-09-13

Added

Improved

Removed

Added

Improved

Removed

Added

Improved

Removed

Added

Improved

Removed

Releases: facebookresearch/xformers

`v0.0.29.post2` - build for PyTorch 2.6.0

[v0.0.29.post1] Fix Flash2 on windows

Enabling FAv3 by default, removed deprecated components

Improved:

Removed:

`v0.0.28.post3` - build for PyTorch 2.5.1

[0.0.28.post3] - 2024-10-30

`v0.0.28.post2` - build for PyTorch 2.5.0

[0.0.28.post2] - 2024-10-18

`0.0.28.post1` - fixing upload for cuda 12.4 wheels

[0.0.28.post1] - 2024-09-13

FAv3, profiler update & AMD

Added

Improved

Removed

torch.compile support, bug fixes & more

Added

Improved

Removed

torch.compile support, bug fixes & more

Added

Improved

Removed

[v0.0.27] torch.compile support, bug fixes & more

Added

Improved

Removed