-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFT::Radix2 uses bit-reversed permutation algorithm #217
Conversation
fd2565c
to
5f1b1d5
Compare
f172fa7
to
895eb88
Compare
src/fft_2n.h
Outdated
@@ -86,32 +86,17 @@ class Radix2 : public FourierTransform<T> { | |||
void fft_inv(vec::Buffers<T>& output, vec::Buffers<T>& input) override; | |||
|
|||
private: | |||
void _fftp(vec::Buffers<T>& output, vec::Buffers<T>& input, bool inv); | |||
void _fft(vec::Vector<T>& output, vec::Vector<T>& input, bool inv); | |||
void _init_bitrev(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is already private, I don't think you need the leading underscore (identifiers that start with an undrescore are usually reserved by the implementation).
src/fft_2n.h
Outdated
int k; | ||
int m; // number of real input elements | ||
int N; | ||
int data_len; // number of real input elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, could be named real_data_len
?
@@ -39,44 +40,43 @@ | |||
#include "vec_vector.h" | |||
#include "vec_zero_ext.h" | |||
|
|||
/** Compute bit-reversed number for a given number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop can be replace by some clever bit twiddling (see https://graphics.stanford.edu/~seander/bithacks.html#BitReverseTable).
inline uint32_t bit_reverse(uint32_t x, uint32_t bit_len)
{
static const uint8_t bit_reverse_table[256] =
{
#define R2(n) n, n + 2*64, n + 1*64, n + 3*64
#define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
#define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
R6(0), R6(2), R6(1), R6(3)
};
#undef R2
#undef R4
#undef R6
uint32_t res;
uint8_t *p = reinterpret_cast<uint8_t*>(&x);
uint8_t *q = reinterpret_cast<uint8_t*>(&res);
q[3] = bit_reverse_table[p[0]];
q[2] = bit_reverse_table[p[1]];
q[1] = bit_reverse_table[p[2]];
q[0] = bit_reverse_table[p[3]];
return res >> (32u - bit_len);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do agree that this change is not necessary as the gain is not significant and this computation is performed once
src/gf_ring.h
Outdated
* @param buf1 - a buffer of `len` elements | ||
* @param buf2 - a buffer of `len` elements | ||
* @param len - number of elements per buffer | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it returns nothing, then remove the @return
src/gf_ring.h
Outdated
/** Butterfly computation for Cooley-Tukey FFT algorithm | ||
* | ||
* Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c` | ||
* \f{eqnarray*}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the DOxygen output, but don't we need to skip a line before the \f{eqnarray*}{
?
src/gf_ring.h
Outdated
* @param buf1 - a buffer of `len` elements | ||
* @param buf2 - a buffer of `len` elements | ||
* @param len - number of elements per buffer | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
src/gf_ring.h
Outdated
/** Butterfly computation for Gentleman-Sande FFT algorithm | ||
* | ||
* Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c` | ||
* \f{eqnarray*}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
} | ||
} | ||
|
||
template <> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy paste of the 16-bit version…
I wonder if we could factorize this kind of code using a type trait (is_vectorizable
) + std::enable_if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll handle it in #222
} | ||
|
||
template <> | ||
void RingModN<uint32_t>::butterfly_gs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
417c622
to
d5454e4
Compare
83bfea1
to
215b90b
Compare
d5454e4
to
41cd2e2
Compare
Some minor enhancements
Implement FFT Radix2 using bit-reversal permutation algorithm. 1. Decimation-in-time FFT - Output is initialized as the input in the bit-reversal ordering - Cooley-Tukey butterfly is performed on the output It leads to an O(N*logN) complexity. Note, we consider particular cases where the input is padded by zeros that always happens in erasure encoding process. Normally, FFT process starts performing butterfly operations on groups of size 2, then 4, 8 etc. Thanks to zero padding, we can start from a group size where there is at most a non-zero element of input in the group, all other elements of the group are zero. Indeed, after performing butterfly operation on the group, all elements equal to the non-zero one. Given FFT length N, number of data length K, the output is initialized in the following clever way: - Each group contains N/K elements - Perform on groups containing a non-zero element of input: copy input's element at the bit-reversed index to all elements of the correspondent group of output - For other groups, initialize them normally It leads to an O(N*logK) complexity 2. Decimation-in-frequency FFT or inverse FFT - Gentleman-Sande butterfly is performed on the input - Output is copied from input at bit-reversed indices It leads to an O(N*logN) complexity.
Two butterfly operations are implemented: 1. Butterfly computation for Cooley-Tukey FFT algorithm Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c` P[i] = P[i] + c \times Q[i] Q[i] = P[i] - c \times Q[i] 2. Butterfly computation for Gentleman-Sande FFT algorithm Perform in-place oprations on two buffers `P`, `Q` with a coefficient `c` P[i] = P[i] + Q[i] Q[i] = c \times (P[i] - Q[i])
Thanks to new FFT::Radix2 based on bit-reversal algorithm, only FFT operation uses data length parameter to decrease complexity. Hence fft_full is no longer used.
Input vector of buffers could be padded, that happens in encoding. To avoid extra-cost of padding at input, a zero-out operation is used for necessary elements of output.
Thanks to the changes at FFT2n, it's not necessarily to pad input vector
aa896a6
to
ea654c9
Compare
For additive FFT, we support FFT length of at most GF's card. Hence, the computation of `get_inv_n_mod_p` should be disable for additive FFT
ea654c9
to
6d1548b
Compare
- Test with longer length - Enforce tests for FFT2n: different `data_len`, input length
6d1548b
to
f2ee43c
Compare
@slaperche-scality : thanks for your review. It's ready for next one :) |
Merging the PR as all comments are addressed. |
Oh, my bad. I forgot to change the base branch. |
Implement FFT Radix2 using bit-reversal permutation algorithm.
It leads to an O(N*logN) complexity.
Note, we consider particular cases where the input is padded by zeros that always happens in erasure encoding process. Normally, FFT process starts performing butterfly operations on groups of size 2, then 4, 8 etc. Thanks to zero padding, we can start from a group size where there is at most a non-zero element of input in the group, all other elements of the group are zero. Indeed, after performing butterfly operation on the group, all elements equal to the non-zero one.
Given FFT length N, number of data length K, the output is initialized in the following clever way:
It leads to an O(N*logK) complexity
It leads to an O(N*logN) complexity.