Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance FFT Radix2 & vectorisation part #262

Merged
merged 15 commits into from
Dec 20, 2018

Conversation

lamphamsy
Copy link
Contributor

@lamphamsy lamphamsy commented Oct 30, 2018

The branch is the target of the two PRs:

The PR is active when the PRs are merged.

@lamphamsy lamphamsy added the Standby It's waiting other works label Oct 30, 2018
@lamphamsy lamphamsy mentioned this pull request Oct 31, 2018
1 task
@vrancurel vrancurel added the Release 0.1 Initial Release label Dec 14, 2018
@lamphamsy lamphamsy removed the Standby It's waiting other works label Dec 20, 2018
@lamphamsy lamphamsy force-pushed the eh/fft_radix2_and_re_work_simd branch from 1171bdb to 71e24cb Compare December 20, 2018 14:18
src/simd_fnt.h Outdated
const unsigned vec_size = countof<T>();
const T max = 1 << (element_size * 8 - 1);
const T max = 1U << (sizeof(T) * 8 - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: could use CHAR_BIT instead of 8 (to be 100% portable, even so 8 is probably enough for the platform we target).

@lamphamsy lamphamsy force-pushed the eh/fft_radix2_and_re_work_simd branch from bfda12f to 5990740 Compare December 20, 2018 14:27
1. Butterfly CT

1.1. single-layer butterfly

For each pair `(P, Q) = (buf[i], buf[i + m]` for `step = 2 * m`

```
      coef r1 = W[start * n / (2 * m)]
      P = P + r1 * Q
      Q = P - r1 * Q
```

1.2. 2-layer butterfly

For each quadruple `(P, Q, R, S) = (buf[i], buf[i + m], buf[i + 2 * m],
buf[i + 3 * m])`

First layer: butterfly on `(P, Q)` and` (R, S)` for `step = 2 * m`

```
      coef r1 = W[start * n / (2 * m)]
      P = P + r1 * Q
      Q = P - r1 * Q
      R = R + r1 * S
      S = R - r1 * S
```

Second layer: butterfly on `(P, R)` and `(Q, S)` for `step = 4 * m`

```
      coef r2 = W[start * n / (4 * m)]
      coef r3 = W[(start + m) * n / (4 * m)]
      P = P + r2 * R
      R = P - r2 * R
      Q = Q + r3 * S
      S = Q - r3 * S
```

2. Butterfly GS

2.1. single-layer butterfly

For each pair `(P, Q) = (buf[i], buf[i + m]` for `step = 2 * m`

```
      coef r = inv_W[start * n / (2 * m)]
      P = P + Q
      Q = r * (P - Q)
```
The SIMD parts will be re-implemented in next commits
1. Essential operations

- simd_128.h contains essential wrappers of SIMD operations on SSE
- simd_256.h contains essential wrappers of SIMD operations on AVX

2. Basic operations

- simd_basic.h contain basic operations used in following cases, and
also operations for RingModN

3. Vectorized operations

- simd_fnt.h contains vectorized operations dedicated for FNT
- simd_nf4.h contains vectorized operations dedicated for nf4
It implements basic operations that will be used everywhere.
It includes also operations for RingModN
- Indices for SIMD parts are computed once in FFT function
- Define butterfly_ct_two_layers_step_slow for non-vectorized functions
@lamphamsy lamphamsy force-pushed the eh/fft_radix2_and_re_work_simd branch from 4e24a3c to 6d974f4 Compare December 20, 2018 15:23
Copy link
Contributor

@slaperche-scality slaperche-scality left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Let's merge it :)

@lamphamsy
Copy link
Contributor Author

@slaperche-scality thanks a lot!

@lamphamsy lamphamsy merged commit c84f572 into master Dec 20, 2018
@lamphamsy lamphamsy deleted the eh/fft_radix2_and_re_work_simd branch March 21, 2019 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release 0.1 Initial Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants