Skip to content

Releases: facebookresearch/xformers

`v0.0.29.post2` - build for PyTorch 2.6.0

31 Jan 20:42
1298453
Compare
Choose a tag to compare

Pre-built binary wheels are available for PyTorch 2.6.0. Following PyTorch, we build wheels for CUDA 11.8, 12.4, and 12.6 only (we no longer build for CUDA 12.1).
xFormers now requires PyTorch >= 2.6

[v0.0.29.post1] Fix Flash2 on windows

31 Dec 10:10
Compare
Choose a tag to compare

This fixes the issue reported in #1163 (comment)

Enabling FAv3 by default, removed deprecated components

27 Dec 09:39
Compare
Choose a tag to compare

Pre-built binary wheels require PyTorch 2.5.1

Improved:

  • [fMHA] Creating a LowerTriangularMask no longer creates a CUDA tensor
  • [fMHA] Updated Flash-Attention to v2.7.2.post1
  • [fMHA] Flash-Attention v3 will now be used by memory_efficient_attention by default when available, unless the operator is enforced with the op keyword-argument. Switching from Flash2 to Flash3 can make transformer trainings ~10% faster end-to-end on H100s
  • [fMHA] Fixed a performance regression with the cutlass backend for the backward pass (#1176) - mostly used on older GPUs (eg V100)
  • Fixed swiglu operator compatibility with torch-compile with PyTorch 2.6
  • Fix activation checkpointing of SwiGLU when AMP is enabled (#1152)

Removed:

  • Following PyTorch, xFormers no longer builds binaries for conda. Pip is now the only recommended way to get xFormers
  • Removed unmaintained/deprecated components in xformers.components.* (see #848)

`v0.0.28.post3` - build for PyTorch 2.5.1

30 Oct 17:56
@lw lw
Compare
Choose a tag to compare

[0.0.28.post3] - 2024-10-30

Pre-built binary wheels require PyTorch 2.5.1

`v0.0.28.post2` - build for PyTorch 2.5.0

22 Oct 11:13
@lw lw
Compare
Choose a tag to compare

[0.0.28.post2] - 2024-10-18

Pre-built binary wheels require PyTorch 2.5.0

`0.0.28.post1` - fixing upload for cuda 12.4 wheels

13 Sep 15:52
Compare
Choose a tag to compare

[0.0.28.post1] - 2024-09-13

Properly upload wheels for cuda 12.4

FAv3, profiler update & AMD

12 Sep 15:49
Compare
Choose a tag to compare

Pre-built binary wheels require PyTorch 2.4.1

Added

  • Added wheels for cuda 12.4
  • Added conda builds for python 3.11
  • Added wheels for rocm 6.1

Improved

  • Profiler: Fix computation of FLOPS for the attention when using xFormers
  • Profiler: Fix MFU/HFU calculation when multiple dtypes are used
  • Profiler: Trace analysis to compute MFU & HFU is now much faster
  • fMHA/splitK: Fixed nan in the output when using a torch.Tensor bias where a lot of consecutive keys are masked with -inf
  • Update Flash-Attention version to v2.6.3 when building from scratch
  • When using the most recent version of Flash-Attention, it is no longer possible to mix it with the cutlass backend. In other words, it is no longer possible to use the cutlass Fw with the flash Bw.

Removed

  • fMHA: Removed decoder and small_k backends
  • profiler: Removed DetectSlowOpsProfiler profiler
  • Removed compatibility with PyTorch < 2.4
  • Removed conda builds for python 3.9
  • Removed windows pip wheels for cuda 12.1 and 11.8

torch.compile support, bug fixes & more

26 Jul 15:41
@lw lw
Compare
Choose a tag to compare

Pre-built binary wheels require PyTorch 2.4.0

Added

  • fMHA: PagedBlockDiagonalGappyKeysMask
  • fMHA: heterogeneous queries in triton_splitk
  • fMHA: support for paged attention in flash
  • fMHA: Added backwards pass for merge_attentions
  • fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
  • fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
  • fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
  • fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
  • 2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

  • fMHA: Fixed out-of-bounds reading for Split-K triton implementation
  • Profiler: fix bug with modules that take a single tuple as argument
  • Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

  • Removed support for PyTorch version older than 2.2.0

torch.compile support, bug fixes & more

25 Jul 11:59
@lw lw
Compare
Choose a tag to compare

Pre-built binary wheels require PyTorch 2.4.0

Added

  • fMHA: PagedBlockDiagonalGappyKeysMask
  • fMHA: heterogeneous queries in triton_splitk
  • fMHA: support for paged attention in flash
  • fMHA: Added backwards pass for merge_attentions
  • fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
  • fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
  • fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
  • fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
  • 2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

  • fMHA: Fixed out-of-bounds reading for Split-K triton implementation
  • Profiler: fix bug with modules that take a single tuple as argument
  • Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

  • Removed support for PyTorch version older than 2.2.0

[v0.0.27] torch.compile support, bug fixes & more

09 Jul 16:35
Compare
Choose a tag to compare

Added

  • fMHA: PagedBlockDiagonalGappyKeysMask
  • fMHA: heterogeneous queries in triton_splitk
  • fMHA: support for paged attention in flash
  • fMHA: Added backwards pass for merge_attentions
  • fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
  • fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
  • fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
  • fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
  • 2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

  • fMHA: Fixed out-of-bounds reading for Split-K triton implementation
  • Profiler: fix bug with modules that take a single tuple as argument
  • Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

  • Removed support for PyTorch version older than 2.2.0