Skip to content

Latest commit

 

History

History
231 lines (174 loc) · 12.3 KB

README.md

File metadata and controls

231 lines (174 loc) · 12.3 KB

TriangularSolve

Stable Dev Build Status Coverage

Performs some triangular solves. For example:

julia> using TriangularSolve, LinearAlgebra, MKL;

julia> BLAS.set_num_threads(1)

julia> BLAS.get_config().loaded_libs
1-element Vector{LinearAlgebra.BLAS.LBTLibraryInfo}:
 LBTLibraryInfo(libmkl_rt.so, ilp64)

julia> N = 100;

julia> A = rand(N,N); B = rand(N,N); C = similar(A);

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  15.909 μs   41.524 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     17.916 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.751 μs ± 697.786 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▁    ▁    ▁     ▄▁    ▇▆    ▆█▃                             ▂
  ██▃▁▁██▁▁▁▁█▆▁▁▃▇██▄▃▁███▆▁▄▄███▄▄▅▅▆▇█▇▄▅▆▇██▇█▇▇▆▄▅▄▁▄▁▄▄▇ █
  15.9 μs       Histogram: log(frequency) by time      19.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  17.578 μs  75.835 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     19.852 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.827 μs ±  1.342 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▂              ▂    ▆▅   ▁█▇▂   ▅▃     ▂                   ▂
  ██▁▁▃█▇▁▁▁█▇▄▄▁██▇▄▄▄██▆▅▄████▅▄▆██▆▆▆▆▇██▇▇▆▆▇▆▅▆▄▅▅▆▄▅▄▅▅ █
  17.6 μs      Histogram: log(frequency) by time      22.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  19.102 μs   69.966 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     21.561 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.565 μs ± 890.952 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▂                 ▂▃     ▄▄     ▆█▄     ▅▅                  ▂
  ██▃▁▁▁▇█▁▁▁▁▅█▁▁▁▁▁██▅▁▁▁▅██▆▁▁▁▆███▆▅▃▅████▃▄▅██▇▇▅▆▆▇▇█▇▆▆ █
  19.1 μs       Histogram: log(frequency) by time      23.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  19.082 μs   39.078 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     19.694 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.765 μs ± 774.848 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃        ▄█         ▁
  ▂▇██▄▂▁▁▂▂▃███▃▂▁▂▁▂▂▅█▇▃▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂ ▃
  19.1 μs         Histogram: frequency by time         22.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Multithreaded benchmarks:

julia> BLAS.set_num_threads(min(Threads.nthreads(), TriangularSolve.VectorizationBase.num_cores()))

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min  max):  8.309 μs   24.357 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     8.769 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.812 μs ± 382.702 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▁▃▄▆▆██▇▆▅▃▁
  ▂▁▂▂▂▂▃▃▃▄▅▇██████████████▇▆▅▄▃▃▃▃▃▂▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▄
  8.31 μs         Histogram: frequency by time         9.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  11.996 μs  151.147 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     14.163 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.281 μs ±   2.372 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▂▄▇███▇▆▅▃▂ ▁   ▂▄▄▅▅▅▆▃▃         ▁
  ▁▁▁▂▂▃▄▇██████████████████████████▇▆▅▄▅▆▇███▆▅▅▃▄▂▂▂▁▁▁▁▁▁▁▁ ▅
  12 μs           Histogram: frequency by time         17.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min  max):  7.903 μs   22.442 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     9.871 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.789 μs ± 864.957 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▃  ▄▃  ▃▅   ▅▃   ▆▂   ▆▄   ▂▇▄   ▃█▅▂▂▁▁▄▆▃▁ ▁             ▂
  ██▅▂██▆▅██▆▆▆██▇▇███▇▇▇████▇█████▆██████████████▇███▇▇▆▇▆▅▆ █
  7.9 μs       Histogram: log(frequency) by time      11.8 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  13.507 μs  142.574 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     15.258 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.319 μs ±   2.045 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▃   ▁▂   ▁▃▅▁  ▁▄▄▁  ▂▆█▆▃
  ▁▂▅███▆▇███▆▅████▆▅████▆▆█████▆▄▄▆▆▅▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁ ▄
  13.5 μs         Histogram: frequency by time         18.5 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.8.0-DEV.438
Commit 88a6376e99* (2021-08-28 11:03 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
  JULIA_NUM_THREADS = 8

Single-threaded benchmarks on an M1 mac:

julia> N = 100;

julia> A = rand(N,N); B = rand(N,N); C = similar(A);

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  21.416 μs   34.458 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     21.624 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.767 μs ± 491.788 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃ ▆██ ▆▄ ▁                 ▃▄ ▄▂          ▁            ▂▃▁ ▂
  ▃▇█▁███▁██▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▃█▁██▁███▁▆▃▁▁▆▇▁██▁█▆▅▁▄▃▁▃▃▇▁███ █
  21.4 μs       Histogram: log(frequency) by time      23.2 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  39.124 μs  57.749 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     46.166 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.274 μs ±  1.766 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              ▁▁▄▂▆▃█▅▇▄▇▅▃▃▁▃▁▂               
  ▂▁▁▂▂▂▂▂▁▂▂▂▂▂▂▃▃▃▃▃▄▄▅▅▆▅▇▇████████████████████▆▇▆▆▅▆▅▅▄▃▃ ▅
  39.1 μs         Histogram: frequency by time        50.2 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  48.291 μs   57.833 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     49.124 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   49.306 μs ± 802.143 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▃▅▆▇██▇██▇▇▆▅▄▂▂▁▁▁▂▁▁▁▁▁▁▁ ▁▁▁                           ▃
  ▃████████████████████████████████████▇▆▄▂▄▃▂▃▃▄▄▃▆▅▇▇▇██▇█▇▇ █
  48.3 μs       Histogram: log(frequency) by time        53 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  34.249 μs   40.208 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     34.375 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   34.748 μs ± 774.675 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆██▆▃▄▅▃                ▁▁▄▅▅▃▂▁                     ▂▃▂  ▁▂ ▂
  ████████▁▁▃▁▁▁▁▁▃▄▃▁▁▃██████████▇▅▄▅▅▆▄▄▄▄▄▅▄▄▃▅▃▄▃▅█████▇██ █
  34.2 μs       Histogram: log(frequency) by time      37.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Or

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  23.750 μs   30.541 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     23.875 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.948 μs ± 316.293 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃▁▆ █ ▇▆▆ ▄ ▁                               ▁ ▁         ▁ ▁ ▂
  ▅███▆█▁███▄█▁██▇▁▄▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▆▁▇▆█▁█▁▇▆▅▁▅▁▇▆█▁█ █
  23.8 μs       Histogram: log(frequency) by time        25 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

For editing convenience (you can copy/paste the above into a REPL and it should automatically strip julia> s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):

using TriangularSolve, LinearAlgebra, MKL;
BLAS.set_num_threads(Threads.nthreads())
BLAS.get_config().loaded_libs
N = 100;

A = rand(N,N); B = rand(N,N); C = similar(A);

@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))

@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false))
@benchmark ldiv!($C, LowerTriangular($B), $A)

BLAS.set_num_threads(TriangularSolve.VectorizationBase.num_cores())
@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))

@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
@benchmark ldiv!($C, LowerTriangular($B), $A)

versioninfo()

Currently, rdiv! with UpperTriangular and ldiv! with LowerTriangulra matrices are the only supported configurations.