SIMD for GF(2^16) #80

burdges · 2021-02-04T11:20:56Z

We invoke C code for SIMD in galois_8.rs but galois_16.rs merely uses galois_8.rs.

There is an more bespoke approach to GF(2^16) in Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions by James Plank, Kevin Greenan, and Ethan Miller.

@drskalman You looked back into the gf-complete C code recently, yes? I think https://github.com/ceph/gf-complete/blob/master/src/gf_w16.c implements their proposed SIMD for GF(2^16), so can tell if it's faster?

Replaces #48 I think

AndersTrier · 2023-11-26T14:26:28Z

There is an more bespoke approach to GF(2^16) in Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions by James Plank, Kevin Greenan, and Ethan Miller.

I've implemented exactly that in pure Rust for AVX2, SSSE3 and Neon. Feel free to draw inspiration from https://github.com/AndersTrier/reed-solomon-simd/tree/master/src/engine

burdges · 2023-11-26T18:25:06Z

At some point I figured out the approach by this crate winds up being obsolete: https://hackmd.io/@rgbPIkIdTwSICPuAq67Jbw/r1B8eRC1u

In https://github.com/paritytech/reed-solomon-novelpoly we ported the leopard C code to rust, which uses additive FFTs and the formal derivative trick, and fixed its handling of gf(2^16) and low rate. Afaik everyone else who cared about erasure coding performance uses the leopard C code directly.

In https://github.com/w3f/rs-ec-perf we'd various optimization ideas for which we made only half-assed attempts like GFNI, and others we never even tried, well other priorities.

I'd conjecture the optimization really worth doing would be doing gf(2^12) as a degree three extension over gf(2^4) with the extension arithmetic implemented using the multi log table form. Also maybe gf(2^16) as a degree four extension over gf(2^16) using the multi log table form, but that becomes really nasty. Instead of having big tables, these need only a bunch of gf(2^4) tables, so they avoid cache pressure. It's possible this out performs in production, even if the big table ones outperform in pure erasure coding benchmarks.

burdges mentioned this issue Feb 4, 2021

Optimize GF(2^16) #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD for GF(2^16) #80

SIMD for GF(2^16) #80

burdges commented Feb 4, 2021 •

edited

Loading

AndersTrier commented Nov 26, 2023

burdges commented Nov 26, 2023

SIMD for GF(2^16) #80

SIMD for GF(2^16) #80

Comments

burdges commented Feb 4, 2021 • edited Loading

AndersTrier commented Nov 26, 2023

burdges commented Nov 26, 2023

burdges commented Feb 4, 2021 •

edited

Loading