FFT::Radix2 uses bit-reversed permutation algorithm #217

lamphamsy · 2018-06-22T08:13:44Z

Implement FFT Radix2 using bit-reversal permutation algorithm.

Decimation-in-time FFT

Output is initialized as the input in the bit-reversal ordering
Cooley-Tukey butterfly is performed on the output

It leads to an O(N*logN) complexity.

Note, we consider particular cases where the input is padded by zeros that always happens in erasure encoding process. Normally, FFT process starts performing butterfly operations on groups of size 2, then 4, 8 etc. Thanks to zero padding, we can start from a group size where there is at most a non-zero element of input in the group, all other elements of the group are zero. Indeed, after performing butterfly operation on the group, all elements equal to the non-zero one.

Given FFT length N, number of data length K, the output is initialized in the following clever way:

Each group contains N/K elements
Perform on groups containing a non-zero element of input: copy input's element at the bit-reversed index to all elements of the correspondent group of output
For other groups, initialize them normally

It leads to an O(N*logK) complexity

Decimation-in-frequency FFT or inverse FFT

Gentleman-Sande butterfly is performed on the input
Output is copied from input at bit-reversed indices

It leads to an O(N*logN) complexity.

slaperche-scality · 2018-07-27T08:50:30Z

src/fft_2n.h

@@ -86,32 +86,17 @@ class Radix2 : public FourierTransform<T> {
    void fft_inv(vec::Buffers<T>& output, vec::Buffers<T>& input) override;

  private:
-    void _fftp(vec::Buffers<T>& output, vec::Buffers<T>& input, bool inv);
-    void _fft(vec::Vector<T>& output, vec::Vector<T>& input, bool inv);
+    void _init_bitrev();


The function is already private, I don't think you need the leading underscore (identifiers that start with an undrescore are usually reserved by the implementation).

slaperche-scality · 2018-07-27T08:51:12Z

src/fft_2n.h

    int k;
-    int m; // number of real input elements
-    int N;
+    int data_len; // number of real input elements


so, could be named real_data_len?

slaperche-scality · 2018-07-27T08:58:08Z

src/fft_2n.h

@@ -39,44 +40,43 @@
 #include "vec_vector.h"
 #include "vec_zero_ext.h"

+/** Compute bit-reversed number for a given number


This loop can be replace by some clever bit twiddling (see https://graphics.stanford.edu/~seander/bithacks.html#BitReverseTable).

inline uint32_t bit_reverse(uint32_t x, uint32_t bit_len) { static const uint8_t bit_reverse_table[256] = { #define R2(n) n, n + 2*64, n + 1*64, n + 3*64 #define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16) #define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 ) R6(0), R6(2), R6(1), R6(3) }; #undef R2 #undef R4 #undef R6 uint32_t res; uint8_t *p = reinterpret_cast<uint8_t*>(&x); uint8_t *q = reinterpret_cast<uint8_t*>(&res); q[3] = bit_reverse_table[p[0]]; q[2] = bit_reverse_table[p[1]]; q[1] = bit_reverse_table[p[2]]; q[0] = bit_reverse_table[p[3]]; return res >> (32u - bit_len); }

we do agree that this change is not necessary as the gain is not significant and this computation is performed once

slaperche-scality · 2018-07-27T09:15:09Z

src/gf_ring.h

+ * @param buf1 - a buffer of `len` elements
+ * @param buf2 - a buffer of `len` elements
+ * @param len - number of elements per buffer
+ * @return


if it returns nothing, then remove the @return

slaperche-scality · 2018-07-27T09:15:48Z

src/gf_ring.h

+/** Butterfly computation for Cooley-Tukey FFT algorithm
+ *
+ * Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c`
+ * \f{eqnarray*}{


I didn't check the DOxygen output, but don't we need to skip a line before the \f{eqnarray*}{?

slaperche-scality · 2018-07-27T09:15:59Z

src/gf_ring.h

+ * @param buf1 - a buffer of `len` elements
+ * @param buf2 - a buffer of `len` elements
+ * @param len - number of elements per buffer
+ * @return


slaperche-scality · 2018-07-27T09:16:04Z

src/gf_ring.h

+/** Butterfly computation for Gentleman-Sande FFT algorithm
+ *
+ * Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c`
+ * \f{eqnarray*}{


slaperche-scality · 2018-07-27T09:44:24Z

src/gf_ring.cpp

+    }
+}
+
+template <>


This is a copy paste of the 16-bit version…

I wonder if we could factorize this kind of code using a type trait (is_vectorizable) + std::enable_if

I'll handle it in #222

slaperche-scality · 2018-07-27T09:44:39Z

src/gf_ring.cpp

+}
+
+template <>
+void RingModN<uint32_t>::butterfly_gs(


Some minor enhancements

Implement FFT Radix2 using bit-reversal permutation algorithm. 1. Decimation-in-time FFT - Output is initialized as the input in the bit-reversal ordering - Cooley-Tukey butterfly is performed on the output It leads to an O(N*logN) complexity. Note, we consider particular cases where the input is padded by zeros that always happens in erasure encoding process. Normally, FFT process starts performing butterfly operations on groups of size 2, then 4, 8 etc. Thanks to zero padding, we can start from a group size where there is at most a non-zero element of input in the group, all other elements of the group are zero. Indeed, after performing butterfly operation on the group, all elements equal to the non-zero one. Given FFT length N, number of data length K, the output is initialized in the following clever way: - Each group contains N/K elements - Perform on groups containing a non-zero element of input: copy input's element at the bit-reversed index to all elements of the correspondent group of output - For other groups, initialize them normally It leads to an O(N*logK) complexity 2. Decimation-in-frequency FFT or inverse FFT - Gentleman-Sande butterfly is performed on the input - Output is copied from input at bit-reversed indices It leads to an O(N*logN) complexity.

Two butterfly operations are implemented: 1. Butterfly computation for Cooley-Tukey FFT algorithm Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c` P[i] = P[i] + c \times Q[i] Q[i] = P[i] - c \times Q[i] 2. Butterfly computation for Gentleman-Sande FFT algorithm Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c` P[i] = P[i] + Q[i] Q[i] = c \times (P[i] - Q[i])

Thanks to new FFT::Radix2 based on bit-reversal algorithm, only FFT operation uses data length parameter to decrease complexity. Hence fft_full is no longer used.

Input vector of buffers could be padded, that happens in encoding. To avoid extra-cost of padding at input, a zero-out operation is used for necessary elements of output.

Thanks to the changes at FFT2n, it's not necessarily to pad input vector

For additive FFT, we support FFT length of at most GF's card. Hence, the computation of `get_inv_n_mod_p` should be disable for additive FFT

- Test with longer length - Enforce tests for FFT2n: different `data_len`, input length

lamphamsy · 2018-08-29T07:28:18Z

@slaperche-scality : thanks for your review. It's ready for next one :)

lamphamsy · 2018-08-31T08:53:00Z

Merging the PR as all comments are addressed.

lamphamsy · 2018-08-31T08:57:41Z

Oh, my bad. I forgot to change the base branch.

lamphamsy force-pushed the eh/various_things branch from fd2565c to 5f1b1d5 Compare June 22, 2018 10:45

lamphamsy force-pushed the ft/fft_bit_reversal_algo branch from f172fa7 to 895eb88 Compare June 22, 2018 10:46

vrancurel approved these changes Jun 24, 2018

View reviewed changes

slaperche-scality suggested changes Jul 27, 2018

View reviewed changes

lamphamsy force-pushed the eh/various_things branch 2 times, most recently from 417c622 to d5454e4 Compare August 7, 2018 11:49

lamphamsy force-pushed the ft/fft_bit_reversal_algo branch 2 times, most recently from 83bfea1 to 215b90b Compare August 7, 2018 12:25

lamphamsy force-pushed the eh/various_things branch from d5454e4 to 41cd2e2 Compare August 27, 2018 04:58

lamphamsy added 11 commits August 28, 2018 11:34

Merge pull request #216 from scality/eh/various_things

6341ee3

Some minor enhancements

Add reference for FFT using bit-reversal algo

2c47e88

GF Ring: butterly using SIMD

33ecfea

SIMD: perform butterly operations on vectors

9402e32

FEC base: read/write packet uses stream reference

27c61a8

FEC classes: get rid of fft_full

001126c

Thanks to new FFT::Radix2 based on bit-reversal algorithm, only FFT operation uses data length parameter to decrease complexity. Hence fft_full is no longer used.

FFT2n: read only useful part of input vectors

dbea690

Input vector of buffers could be padded, that happens in encoding. To avoid extra-cost of padding at input, a zero-out operation is used for necessary elements of output.

FEC FNT: do not need to pad input for encoding

fa95197

Thanks to the changes at FFT2n, it's not necessarily to pad input vector

FFT2N: rename init_bitrev

d50396f

lamphamsy force-pushed the ft/fft_bit_reversal_algo branch 2 times, most recently from aa896a6 to ea654c9 Compare August 29, 2018 06:49

FFT base: do not need to compute inv(n) for additive

54e9ded

For additive FFT, we support FFT length of at most GF's card. Hence, the computation of `get_inv_n_mod_p` should be disable for additive FFT

lamphamsy force-pushed the ft/fft_bit_reversal_algo branch from ea654c9 to 6d1548b Compare August 29, 2018 06:51

lamphamsy added 2 commits August 29, 2018 09:08

FFT test: enforce tests

0343548

- Test with longer length - Enforce tests for FFT2n: different `data_len`, input length

GF ring: address comments

f2ee43c

lamphamsy force-pushed the ft/fft_bit_reversal_algo branch from 6d1548b to f2ee43c Compare August 29, 2018 07:09

lamphamsy merged commit 8dbb617 into eh/various_things Aug 31, 2018

lamphamsy deleted the ft/fft_bit_reversal_algo branch October 5, 2018 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT::Radix2 uses bit-reversed permutation algorithm #217

FFT::Radix2 uses bit-reversed permutation algorithm #217

lamphamsy commented Jun 22, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Jul 27, 2018

lamphamsy Aug 29, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Jul 27, 2018

slaperche-scality Aug 13, 2018

slaperche-scality Jul 27, 2018

lamphamsy commented Aug 29, 2018

lamphamsy commented Aug 31, 2018

lamphamsy commented Aug 31, 2018

FFT::Radix2 uses bit-reversed permutation algorithm #217

FFT::Radix2 uses bit-reversed permutation algorithm #217

Conversation

lamphamsy commented Jun 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lamphamsy commented Aug 29, 2018

lamphamsy commented Aug 31, 2018

lamphamsy commented Aug 31, 2018