Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD for GF(2^16) #80

Open
burdges opened this issue Feb 4, 2021 · 2 comments
Open

SIMD for GF(2^16) #80

burdges opened this issue Feb 4, 2021 · 2 comments

Comments

@burdges
Copy link
Collaborator

burdges commented Feb 4, 2021

We invoke C code for SIMD in galois_8.rs but galois_16.rs merely uses galois_8.rs.

There is an more bespoke approach to GF(2^16) in Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions by James Plank, Kevin Greenan, and Ethan Miller.

@drskalman You looked back into the gf-complete C code recently, yes? I think https://github.com/ceph/gf-complete/blob/master/src/gf_w16.c implements their proposed SIMD for GF(2^16), so can tell if it's faster?

Replaces #48 I think

@AndersTrier
Copy link

There is an more bespoke approach to GF(2^16) in Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions by James Plank, Kevin Greenan, and Ethan Miller.

I've implemented exactly that in pure Rust for AVX2, SSSE3 and Neon. Feel free to draw inspiration from https://github.com/AndersTrier/reed-solomon-simd/tree/master/src/engine

@burdges
Copy link
Collaborator Author

burdges commented Nov 26, 2023

At some point I figured out the approach by this crate winds up being obsolete: https://hackmd.io/@rgbPIkIdTwSICPuAq67Jbw/r1B8eRC1u

In https://github.com/paritytech/reed-solomon-novelpoly we ported the leopard C code to rust, which uses additive FFTs and the formal derivative trick, and fixed its handling of gf(2^16) and low rate. Afaik everyone else who cared about erasure coding performance uses the leopard C code directly.

In https://github.com/w3f/rs-ec-perf we'd various optimization ideas for which we made only half-assed attempts like GFNI, and others we never even tried, well other priorities.

I'd conjecture the optimization really worth doing would be doing gf(2^12) as a degree three extension over gf(2^4) with the extension arithmetic implemented using the multi log table form. Also maybe gf(2^16) as a degree four extension over gf(2^16) using the multi log table form, but that becomes really nasty. Instead of having big tables, these need only a bunch of gf(2^4) tables, so they avoid cache pressure. It's possible this out performs in production, even if the big table ones outperform in pure erasure coding benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants