Releases
v0.21.0
awni
released this
22 Nov 20:18
Highlights
Support 3 and 6 bit quantization: benchmarks
Much faster memory efficient attention for headdim 64, 80: benchmarks
Much faster sdpa inference kernel for longer sequences: benchmarks
Core
contiguous
op (C++ only) + primitive
Bfs width limit to reduce memory consumption during eval
Fast CPU quantization
Faster indexing math in several kernels:
unary, binary, ternary, copy, compiled, reduce
Improve dispatch threads for a few kernels:
conv, gemm splitk, custom kernels
More buffer donation with no-ops to reduce memory use
Use CMAKE_OSX_DEPLOYMENT_TARGET
to pick Metal version
Dispatch Metal bf16 type at runtime when using the JIT
NN
nn.AvgPool3d
and nn.MaxPool3d
Support groups
in nn.Conv2d
Bug fixes
Fix per-example mask + docs in sdpa
Fix FFT synchronization bug (use dispatch method everywhere)
Throw for invalid *fft{2,n}
cases
Fix OOB access in qmv
Fix donation in sdpa to reduce memory use
Allocate safetensors header on the heap to avoid stack overflow
Fix sibling memory leak
Fix view
segfault for scalars input
Fix concatenate vmap
You can’t perform that action at this time.