ggml : rewrite silu and softmax for cpu #111

Nexesenex · 2024-05-10T17:52:46Z

This change upstreams llamafile's vectorized expf() functions. This lets us compute softmax and silu more accurately than the short[65536] lookup table that GGML previously used to make this operation go faster. We can support aarch64 and sse2+ with the worst case rounding error of 2ulp. It makes make -j8 tests && ./tests/test-backend-ops -o SOFT_MAX -b CPU perf go 1.5x faster for SSE2+FMA, 1.9x faster for AVX2+FMA and 2.1x on AVX512

Co-authored-by: Iwan Kawrakow <[email protected]>

Nexesenex merged commit 2e02dba into Nexesenex:sidestream May 10, 2024
11 of 14 checks passed

Nexesenex pushed a commit that referenced this pull request Dec 22, 2024

Use fused mul - unary op also for MoE models (#111)

5ad6439

Co-authored-by: Iwan Kawrakow <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : rewrite silu and softmax for cpu #111

ggml : rewrite silu and softmax for cpu #111

Nexesenex commented May 10, 2024

ggml : rewrite silu and softmax for cpu #111

ggml : rewrite silu and softmax for cpu #111

Conversation

Nexesenex commented May 10, 2024