-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign Multi-threading #159
Comments
I had a request from Noam Bernstein to allow setting the number of threads in each call to forces, etc |
Adding comments on the implementation 1From Julia documentation
So, using it to get address to temporary data is a possible source of bugs! The correct way to implement temporal data, is to use Channels. Create a Channel that holds temporary data. At the beginning of each iteration pull a data instance from the Channel. Once the task is done, put the instance back into the Channel. 2Setting number of threads for each call to forces needs a special implementation. FoldsThreads.jl has This means of using either Folds.jl or FLoops.jl. The advantage here is that there are several different executors available that have different advantages. This has several potential use cases. Like, we could use |
On 2 - this sounds great, I'm looking forward to learning more. On 1 - this is a bit cryptic for me, let me see if I understand it correctly: the problem is that a thread could be paused in the middle of executing the body of a for-loop and then the same thread could start a new iteration of the for loop body. So now I'm trying to write into the same temporary arrays at the same time. Correct? This would indeed be entirely unexpected for me and yes that could cause plenty of bugs. But I don't like the consequence of this which seems to be that I need far more temporary arrays than threads. And we will end up allocating a lot, no? Are the Channels you are referring to the same as what I call ObjectPools? Or is this something else entirely? |
I now also think, if I understood you correctly in 1, then this means my temporary arrays in |
Basically what can happen is that:
Other problem here is that the behaviour can depend on Julia versions, so there is no trusting on what can be done with it. ChannelsChannels are Julia construct for communicating between task. The documentation is on asynchronous computing section, but Channels are completely thread safe and can be used with threads. There is also RemoteChannel for distributed computing. Here is an example of how to use Channels: julia> c = Channel(4)
Channel{Any}(4) (empty)
julia> for i in 1:4
put!(c,i)
end
julia> Threads.@threads for i in 1:8
t = take!(c)
println(i, " ", Threads.threadid(), ",", t)
put!(c, t)
sleep(rand())
end
1 3,3
3 2,2
7 4,4
5 1,1
8 4,3
2 3,2
6 3,4
4 2,1 So, allocations are done only once and only same number as (worker) threads are allocated. In fact, now that I think about it, we should probably replace |
Thank you for the explanation. |
Since when has this dynamic scheduling started ? And is there a way to avoid it? I'll need to send around a warning to people who are using multi-threading with ACE / JuLIP. EDIT: I see that I simply need to tag new versions with the |
Yes, it will work as a temporary fix. |
I think it was either v1.6 or v1.7 that introduced Dynamical load balancing. |
I tested the Channel overhead julia> using BenchmarkTools
julia> N = 100
100
julia> c = Channel(N)
Channel{Any}(100) (empty)
julia> for i in 1:N; put!(c, rand(3)) end
julia> b = @benchmarkable take!(c) samples=N
Benchmark(evals=1, seconds=5.0, samples=100)
julia> run(b)
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 41.000 ns … 1.070 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 83.000 ns ┊ GC (median): 0.00%
Time (mean ± σ): 10.798 μs ± 107.003 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▄
41 ns Histogram: log(frequency) by time 3.5 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> p = @benchmarkable put!(c, rand(3)) samples=N
Benchmark(evals=1, seconds=5.0, samples=100)
julia> run(p)
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 83.000 ns … 14.083 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 84.000 ns ┊ GC (median): 0.00%
Time (mean ± σ): 275.830 ns ± 1.441 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▄
83 ns Histogram: log(frequency) by time 3.71 μs <
Memory estimate: 80 bytes, allocs estimate: 1. For me this sounds as an acceptable overhead. |
Adding a threaded version to above julia> function f(c,n)
Threads.@threads for i in 1:n
t = take!(c)
put!(c,t)
end
end
f (generic function with 1 method)
julia> @benchmark f(c,N)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 15.875 μs … 1.495 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 47.417 μs ┊ GC (median): 0.00%
Time (mean ± σ): 56.133 μs ± 45.093 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▅▇▅▄▅▇█▅▄▃▂
▁▃▆▆▇████████████▇▇▅▅▄▄▄▅▆▅▅▅▄▄▄▄▃▃▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
15.9 μs Histogram: frequency by time 158 μs <
Memory estimate: 2.14 KiB, allocs estimate: 25.
|
thanks for this, I'll think about how a realistic test might look like. |
coming back to this - there are three scenarios:
To test this why not try it out by multi-threading the evaluation of an EAM or SW potential? That's already in JuLIP. This is the worst-case scenario and if that scales well then I think we are fine. In the meantime I'll finish a prototype of the new ACE evaluator, and you can then try it out with that as well? |
cc @tjjarvinen
The text was updated successfully, but these errors were encountered: