Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT::Radix2 uses bit-reversed permutation algorithm #217

Merged
merged 14 commits into from
Aug 31, 2018

Conversation

lamphamsy
Copy link
Contributor

Implement FFT Radix2 using bit-reversal permutation algorithm.

  1. Decimation-in-time FFT
  • Output is initialized as the input in the bit-reversal ordering
  • Cooley-Tukey butterfly is performed on the output

It leads to an O(N*logN) complexity.

Note, we consider particular cases where the input is padded by zeros that always happens in erasure encoding process. Normally, FFT process starts performing butterfly operations on groups of size 2, then 4, 8 etc. Thanks to zero padding, we can start from a group size where there is at most a non-zero element of input in the group, all other elements of the group are zero. Indeed, after performing butterfly operation on the group, all elements equal to the non-zero one.

Given FFT length N, number of data length K, the output is initialized in the following clever way:

  • Each group contains N/K elements
  • Perform on groups containing a non-zero element of input: copy input's element at the bit-reversed index to all elements of the correspondent group of output
  • For other groups, initialize them normally

It leads to an O(N*logK) complexity

  1. Decimation-in-frequency FFT or inverse FFT
  • Gentleman-Sande butterfly is performed on the input
  • Output is copied from input at bit-reversed indices

It leads to an O(N*logN) complexity.

@lamphamsy lamphamsy force-pushed the eh/various_things branch from fd2565c to 5f1b1d5 Compare June 22, 2018 10:45
@lamphamsy lamphamsy force-pushed the ft/fft_bit_reversal_algo branch from f172fa7 to 895eb88 Compare June 22, 2018 10:46
src/fft_2n.h Outdated
@@ -86,32 +86,17 @@ class Radix2 : public FourierTransform<T> {
void fft_inv(vec::Buffers<T>& output, vec::Buffers<T>& input) override;

private:
void _fftp(vec::Buffers<T>& output, vec::Buffers<T>& input, bool inv);
void _fft(vec::Vector<T>& output, vec::Vector<T>& input, bool inv);
void _init_bitrev();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is already private, I don't think you need the leading underscore (identifiers that start with an undrescore are usually reserved by the implementation).

src/fft_2n.h Outdated
int k;
int m; // number of real input elements
int N;
int data_len; // number of real input elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, could be named real_data_len?

@@ -39,44 +40,43 @@
#include "vec_vector.h"
#include "vec_zero_ext.h"

/** Compute bit-reversed number for a given number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop can be replace by some clever bit twiddling (see https://graphics.stanford.edu/~seander/bithacks.html#BitReverseTable).

inline uint32_t bit_reverse(uint32_t x, uint32_t bit_len)
{
    static const uint8_t bit_reverse_table[256] =
    {
#define R2(n)     n,     n + 2*64,     n + 1*64,     n + 3*64
#define R4(n)  R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
#define R6(n)  R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
    R6(0), R6(2), R6(1), R6(3)
    };
#undef R2
#undef R4
#undef R6
    uint32_t res;
    uint8_t *p = reinterpret_cast<uint8_t*>(&x);
    uint8_t *q = reinterpret_cast<uint8_t*>(&res);

    q[3] = bit_reverse_table[p[0]];
    q[2] = bit_reverse_table[p[1]];
    q[1] = bit_reverse_table[p[2]];
    q[0] = bit_reverse_table[p[3]];

    return res >> (32u - bit_len);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do agree that this change is not necessary as the gain is not significant and this computation is performed once

src/gf_ring.h Outdated
* @param buf1 - a buffer of `len` elements
* @param buf2 - a buffer of `len` elements
* @param len - number of elements per buffer
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it returns nothing, then remove the @return

src/gf_ring.h Outdated
/** Butterfly computation for Cooley-Tukey FFT algorithm
*
* Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c`
* \f{eqnarray*}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the DOxygen output, but don't we need to skip a line before the \f{eqnarray*}{?

src/gf_ring.h Outdated
* @param buf1 - a buffer of `len` elements
* @param buf2 - a buffer of `len` elements
* @param len - number of elements per buffer
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

src/gf_ring.h Outdated
/** Butterfly computation for Gentleman-Sande FFT algorithm
*
* Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c`
* \f{eqnarray*}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}
}

template <>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy paste of the 16-bit version…

I wonder if we could factorize this kind of code using a type trait (is_vectorizable) + std::enable_if

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll handle it in #222

}

template <>
void RingModN<uint32_t>::butterfly_gs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@lamphamsy lamphamsy force-pushed the eh/various_things branch 2 times, most recently from 417c622 to d5454e4 Compare August 7, 2018 11:49
@lamphamsy lamphamsy force-pushed the ft/fft_bit_reversal_algo branch 2 times, most recently from 83bfea1 to 215b90b Compare August 7, 2018 12:25
Implement FFT Radix2 using bit-reversal permutation algorithm.

1. Decimation-in-time FFT
- Output is initialized as the input in the bit-reversal ordering
- Cooley-Tukey butterfly is performed on the output

It leads to an O(N*logN) complexity.

Note, we consider particular cases where the input is padded by zeros
that always happens in erasure encoding process. Normally, FFT process
starts performing butterfly operations on groups of size 2, then 4, 8
etc. Thanks to zero padding, we can start from a group size where there
is at most a non-zero element of input in the group, all other elements
of the group are zero. Indeed, after performing butterfly operation on
the group, all elements equal to the non-zero one.

Given FFT length N, number of data length K, the output is initialized
in the following clever way:
- Each group contains N/K elements
- Perform on groups containing a non-zero element of input: copy input's
element at the bit-reversed index to all elements of the correspondent
group of output
- For other groups, initialize them normally

It leads to an O(N*logK) complexity

2. Decimation-in-frequency FFT or inverse FFT
- Gentleman-Sande butterfly is performed on the input
- Output is copied from input at bit-reversed indices

It leads to an O(N*logN) complexity.
Two butterfly operations are implemented:

1. Butterfly computation for Cooley-Tukey FFT algorithm

Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c`

P[i] = P[i] + c \times Q[i]
Q[i] = P[i] - c \times Q[i]

2. Butterfly computation for Gentleman-Sande FFT algorithm

Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c`

P[i] = P[i] + Q[i]
Q[i] = c \times (P[i] - Q[i])
Thanks to new FFT::Radix2 based on bit-reversal algorithm, only FFT
operation uses data length parameter to decrease complexity. Hence
fft_full is no longer used.
Input vector of buffers could be padded, that happens in encoding.
To avoid extra-cost of padding at input, a zero-out operation is used
for necessary elements of output.
Thanks to the changes at FFT2n, it's not necessarily to pad input vector
@lamphamsy lamphamsy force-pushed the ft/fft_bit_reversal_algo branch 2 times, most recently from aa896a6 to ea654c9 Compare August 29, 2018 06:49
For additive FFT, we support FFT length of at most GF's card. Hence,
the computation of `get_inv_n_mod_p` should be disable for additive FFT
@lamphamsy lamphamsy force-pushed the ft/fft_bit_reversal_algo branch from ea654c9 to 6d1548b Compare August 29, 2018 06:51
- Test with longer length
- Enforce tests for FFT2n: different `data_len`, input length
@lamphamsy lamphamsy force-pushed the ft/fft_bit_reversal_algo branch from 6d1548b to f2ee43c Compare August 29, 2018 07:09
@lamphamsy
Copy link
Contributor Author

@slaperche-scality : thanks for your review. It's ready for next one :)

@lamphamsy
Copy link
Contributor Author

Merging the PR as all comments are addressed.

@lamphamsy lamphamsy merged commit 8dbb617 into eh/various_things Aug 31, 2018
@lamphamsy
Copy link
Contributor Author

Oh, my bad. I forgot to change the base branch.

@lamphamsy lamphamsy deleted the ft/fft_bit_reversal_algo branch October 5, 2018 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants