-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance FFT Radix2 & vectorisation part #262
Conversation
1171bdb
to
71e24cb
Compare
src/simd_fnt.h
Outdated
const unsigned vec_size = countof<T>(); | ||
const T max = 1 << (element_size * 8 - 1); | ||
const T max = 1U << (sizeof(T) * 8 - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: could use CHAR_BIT instead of 8 (to be 100% portable, even so 8 is probably enough for the platform we target).
bfda12f
to
5990740
Compare
1. Butterfly CT 1.1. single-layer butterfly For each pair `(P, Q) = (buf[i], buf[i + m]` for `step = 2 * m` ``` coef r1 = W[start * n / (2 * m)] P = P + r1 * Q Q = P - r1 * Q ``` 1.2. 2-layer butterfly For each quadruple `(P, Q, R, S) = (buf[i], buf[i + m], buf[i + 2 * m], buf[i + 3 * m])` First layer: butterfly on `(P, Q)` and` (R, S)` for `step = 2 * m` ``` coef r1 = W[start * n / (2 * m)] P = P + r1 * Q Q = P - r1 * Q R = R + r1 * S S = R - r1 * S ``` Second layer: butterfly on `(P, R)` and `(Q, S)` for `step = 4 * m` ``` coef r2 = W[start * n / (4 * m)] coef r3 = W[(start + m) * n / (4 * m)] P = P + r2 * R R = P - r2 * R Q = Q + r3 * S S = Q - r3 * S ``` 2. Butterfly GS 2.1. single-layer butterfly For each pair `(P, Q) = (buf[i], buf[i + m]` for `step = 2 * m` ``` coef r = inv_W[start * n / (2 * m)] P = P + Q Q = r * (P - Q) ```
The SIMD parts will be re-implemented in next commits
1. Essential operations - simd_128.h contains essential wrappers of SIMD operations on SSE - simd_256.h contains essential wrappers of SIMD operations on AVX 2. Basic operations - simd_basic.h contain basic operations used in following cases, and also operations for RingModN 3. Vectorized operations - simd_fnt.h contains vectorized operations dedicated for FNT - simd_nf4.h contains vectorized operations dedicated for nf4
It implements basic operations that will be used everywhere. It includes also operations for RingModN
- Indices for SIMD parts are computed once in FFT function - Define butterfly_ct_two_layers_step_slow for non-vectorized functions
4e24a3c
to
6d974f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Let's merge it :)
@slaperche-scality thanks a lot! |
The branch is the target of the two PRs:
Part1: Part1: Radix-2 FFT enhancement by working on two layers per iteration #253
Part2: Part2: Vectorize operations for Radix-2 FFT & re-work vectorization part #254
The PR is active when the PRs are merged.