Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does fftw plan reusage makes sense? #57

Open
tdd11235813 opened this issue Nov 23, 2016 · 4 comments
Open

Does fftw plan reusage makes sense? #57

tdd11235813 opened this issue Nov 23, 2016 · 4 comments

Comments

@tdd11235813
Copy link
Contributor

FFTW_MEASURE means, that fftw overwrites the input and output buffers in the planning stage.
After planning the buffers can be filled with data (memcpy).
Plan reusage means to have only one plan at a time.
For non-fftw_measure (estimate, wisdom) plans I think, it is NOT worthwhile to reuse fftw plans as they do not allocate temporary buffers (are we sure?).
But: It might be worthwhile in terms of memcpy. We could save memcpy part as long as padding is not required. The input data coming from BenchmarkExecutor is aligned, but not padded with respect to FFT, so only for padding (Inplace Real2Complex) the memcpy part would be required.
Have to look on the results w.r.t. upload and download times ..

@psteinb
Copy link
Contributor

psteinb commented Nov 24, 2016

any news on this? I wondered if this idea is relevant for using FFTW or for interpreting the results of gearshifft?

@tdd11235813
Copy link
Contributor Author

to finally give an answer on this, I plotted Time of Upload vs Total Time to get the ratio.

rshiny-fftw-upload-total-ratio

upload refers to the memcpy operation and the timer measured a ~40% contribution to the total solution time at the worst case. But does this really comes from memcpy?

rshiny-fftw-download-total-ratio

download is the same memcpy operation, just in the other direction. It is smooth and fast, no significant times here. So the long upload time might come from a cache warmup.
The memcpy can be avoided when transform is not a real-inplace and when fftw is not run with fftw-measure. It would reduce the total runtime by 40% if we assume the cache warmup to be the only responsible factor for the upload time.

The rshiny tool is going to get an update to examine such statistics. At the moment I do not plan to change fftw in gearshifft to avoid the memcopies in the aforementioned cases.

@psteinb
Copy link
Contributor

psteinb commented Jun 15, 2017

thanks for the update. Interesting findings I believe.

Are these results from multi-threaded or single-threaded runs? I am asking as it doesn't need to be warm-up only, but (in a multi-threaded scenario) also cache line trashing.

@tdd11235813
Copy link
Contributor Author

true. this is multi-threaded. the single-threaded benchmark is still running on taurus. let's see what we will have there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants