Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polyester makes FFTW slower #63

Open
tknopp opened this issue Jan 23, 2022 · 7 comments
Open

Polyester makes FFTW slower #63

tknopp opened this issue Jan 23, 2022 · 7 comments

Comments

@tknopp
Copy link

tknopp commented Jan 23, 2022

Hi Chris,

I have tracked down a multi-threading performance issue and am now at a point where I am sure that Polyester is the cause (or my usage of Polyester).

I put together an MWE. One needs to dev NFFT and then execute
https://github.com/JuliaMath/NFFT.jl/blob/master/benchmark/performance_issue_mt.jl
with julia -t 32 (or less).

What basically happens is that I have a Polyester threaded loop and afterwards call a threaded FFT. Now what happens is that the threaded Julia code makes the FFTW call slower. In my setting the FFT was slowed down by a factor of 5. Now what's interesting: If I switch from @batch to Threads.@threads then I have no performance issue.

To toggle between both version one need to change this line: https://github.com/JuliaMath/NFFT.jl/blob/master/src/utils.jl#L6
and restart Julia. I hope this description is sufficient.

@chriselrod
Copy link
Member

chriselrod commented Jan 23, 2022

This is probably the usual problem of Polyester not playing well with base threads, given that FFTW seems to use them JuliaMath/FFTW.jl#105

Does the second @time also show that FFTW is slower?
How long of a delay between @batch finishing, and FFTW starting to run? If you have a delay of a few milliseconds, is FFTW still slower?

@tknopp
Copy link
Author

tknopp commented Jan 23, 2022

Does the second @time also show that FFTW is slower?

No thats way I make two calls. Its just the first that is affected.

@chriselrod
Copy link
Member

chriselrod commented Jan 23, 2022

Does the second @time also show that FFTW is slower?

No thats way I make two calls. Its just the first that is affected.

That's what I expected.
The reason Polyester makes Threads.@threads and Threads.@spawn slower is because the tasks Polyester's threads run on spend some time looking for new work to do before giving up.
If you run @batch, LoopVectorization.@tturbo, or Octavian.matmul(!) during this time, they will be very fast to start.
However, these tasks prevent new tasks from running on their threads during this time.

(Note that partr threads do the same thing, but the tasks Polyester runs on do run on these threads, so the interference is only one way.)

You could try reducing the amount of time ThreadingUtiltie's threads spend working:
https://github.com/JuliaSIMD/ThreadingUtilities.jl/blob/dfdcc027d5eb7f4d5eb667c9b16344fc5f8f6d6f/src/threadtasks.jl#L23

But, I think you should use Threads.@threads instead in this case.
Basically, switching to FFTW and running it until ThreadingUtitlies' tasks go to sleep will also make ThreadingUtilities' tasks slower to start up next time.
Still faster than Threads.@threads, but the advantage is diminished.

Of course, you might be running many @batches between ffts, in which case only the first one has to wake up the tasks.

@tknopp
Copy link
Author

tknopp commented Jan 23, 2022

Thanks a lot for the explanation. Then I stick with Threads.@threads for the moment.

Note that I am writing a pure Julia version of an established C-library and benchmark them against each other. While the implementations are not exactly the same this is a nice playground for multi-threaded Julia high-performance computing.

@chriselrod
Copy link
Member

chriselrod commented Jan 23, 2022

I think I could add a function that puts them all to sleep.
You'd have to call it manually when transitioning. It'd be most likely to help (vs @threads) if you at least get multiple @batches in between each time you put them to sleep, but it may be worth trying.

@tknopp
Copy link
Author

tknopp commented Jan 23, 2022

That sounds interesting. Calling it manually would be no problem. If you could provide that, I could do a benchmark of both versions and report back

@chriselrod
Copy link
Member

There is now a ThreadingUtilities.sleep_all_tasks(), tested here:
https://github.com/JuliaSIMD/ThreadingUtilities.jl/blob/716d26f4650b0a4042b87413f9952646501949ff/test/staticarrays.jl#L82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants