Skip to content

Commit

Permalink
ggml : rewrite silu and softmax for cpu
Browse files Browse the repository at this point in the history
This change upstreams llamafile's vectorized expf() functions. This lets
us compute softmax and silu more accurately than the short[65536] lookup
table that GGML previously used to make this operation go faster. We can
support aarch64 and sse2+ with the worst case rounding error of 2ulp. It
makes make -j8 tests && ./tests/test-backend-ops -o SOFT_MAX -b CPU perf
go 1.5x faster for SSE2+FMA, 1.9x faster for AVX2+FMA and 2.1x on AVX512
  • Loading branch information
jart committed May 10, 2024
1 parent f98eb31 commit d7359a3
Showing 1 changed file with 283 additions and 193 deletions.
Loading

0 comments on commit d7359a3

Please sign in to comment.