Matmul backup #6

WhiffleFish · 2023-02-25T20:01:08Z

Reduce alpha vector backup to a single matrix multiplication.

Current performance is quite bad relative to main branch, but I think the idea may still be worth pursuing.

Before

Tiger

BenchmarkTools.Trial: 558 samples with 1 evaluation.
 Range (min … max):  8.515 ms …  15.210 ms  ┊ GC (min … max): 0.00% … 41.33%
 Time  (median):     8.844 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.966 ms ± 833.336 μs  ┊ GC (mean ± σ):  1.29% ±  5.52%

    ▇█                                                         
  ▃▆██▇▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▂▂ ▂
  8.52 ms         Histogram: frequency by time        14.7 ms <

 Memory estimate: 2.00 MiB, allocs estimate: 13031.

Rocksample(5,5)

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range (min … max):  201.661 ms … 209.820 ms  ┊ GC (min … max): 0.00% … 2.03%
 Time  (median):     205.455 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   205.419 ms ±   2.128 ms  ┊ GC (mean ± σ):  0.25% ± 0.69%

  ▁      ▁  ▁█▁▁▁    ▁ ▁  ▁  ▁▁   ███ ▁           █▁   ▁      ▁  
  █▁▁▁▁▁▁█▁▁█████▁▁▁▁█▁█▁▁█▁▁██▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁▁██▁▁▁█▁▁▁▁▁▁█ ▁
  202 ms           Histogram: frequency by time          210 ms <

 Memory estimate: 14.99 MiB, allocs estimate: 69304.

Rocksample(10,10)

BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min … max):  293.601 ms … 303.135 ms  ┊ GC (min … max): 0.00% … 1.79%
 Time  (median):     297.885 ms               ┊ GC (median):    0.71%
 Time  (mean ± σ):   298.023 ms ±   3.036 ms  ┊ GC (mean ± σ):  0.85% ± 0.90%

  █ █  █     █ ██    █     █ █ ██       █       █ █    ██     █  
  █▁█▁▁█▁▁▁▁▁█▁██▁▁▁▁█▁▁▁▁▁█▁█▁██▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁█▁▁▁▁██▁▁▁▁▁█ ▁
  294 ms           Histogram: frequency by time          303 ms <

 Memory estimate: 64.93 MiB, allocs estimate: 82768.

After

Tiger

BenchmarkTools.Trial: 556 samples with 1 evaluation.
 Range (min … max):  8.625 ms …  16.160 ms  ┊ GC (min … max): 0.00% … 43.59%
 Time  (median):     8.874 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.002 ms ± 880.114 μs  ┊ GC (mean ± σ):  1.21% ±  5.39%

   █▅▁                                                         
  ▇███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▂ ▂
  8.62 ms         Histogram: frequency by time        15.7 ms <

 Memory estimate: 2.00 MiB, allocs estimate: 13043.

Rocksample(5,5)

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range (min … max):  200.425 ms … 208.799 ms  ┊ GC (min … max): 0.00% … 2.23%
 Time  (median):     203.231 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   203.680 ms ±   2.312 ms  ┊ GC (mean ± σ):  0.34% ± 0.78%

                  █            ▃   █                          ▃  
  ▇▁▇▇▇▁▁▁▇▁▇▁▇▇▁▁█▇▁▁▇▇▁▁▁▇▇▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁█ ▁
  200 ms           Histogram: frequency by time          209 ms <

 Memory estimate: 22.43 MiB, allocs estimate: 69343.

Rocksample(10,10)

BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min … max):  298.811 ms … 305.206 ms  ┊ GC (min … max): 0.71% … 1.33%
 Time  (median):     302.556 ms               ┊ GC (median):    1.37%
 Time  (mean ± σ):   302.161 ms ±   1.666 ms  ┊ GC (mean ± σ):  1.45% ± 0.33%

                    ▃                █        ▃                  
  ▇▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁█▁▁▇▁▇▁▁▁▇▁▁▁▁▁▁▁█▇▇▁▇▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▇ ▁
  299 ms           Histogram: frequency by time          305 ms <

 Memory estimate: 182.57 MiB, allocs estimate: 82816.

codecov · 2023-02-25T20:05:39Z

Codecov Report

Base: 99.27% // Head: 99.30% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (b1486f3) compared to base (29ed7e9).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b1486f3 differs from pull request most recent head dc5f766. Consider uploading reports for the commit dc5f766 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main       #6      +/-   ##
==========================================
+ Coverage   99.27%   99.30%   +0.02%     
==========================================
  Files          12       12              
  Lines         690      716      +26     
==========================================
+ Hits          685      711      +26     
  Misses          5        5

Impacted Files	Coverage Δ
src/tree.jl	`100.00% <ø> (ø)`
src/backup.jl	`100.00% <100.00%> (ø)`
src/cache.jl	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

zsunberg · 2023-02-28T17:02:07Z

Seems like there are a lot of allocs. Can you do it more in-place somehow?

WhiffleFish · 2023-02-28T18:21:59Z

A vast majority of allocations occur when generating new beliefs and storing these beliefs in the tree. While this could be improved by either preallocating beliefs at the very beginning or using something like a bump allocator around the belief updater, I'm not entirely sure it's worth the effort.

Currently, allocations have an extremely negligible effect on performance. In profiles, elementary non-allocating array operations like getindex and setindex! take up a vast majority of computation time.

As far as I can tell, the best ways of improving performance currently would be through #2, #3, and #7.

That being said, the current implementation is already quite fast, so these improvements aren't absolutely necessary.

rejuvyesh · 2023-02-28T23:19:51Z

What Matrix sizes are you evaluating on? It matters more once you have greater than 50x50 dimensions in your matrix.

WhiffleFish · 2023-03-01T00:06:28Z

RockSample(10,10) has 801 states, so there we're working with an 801 x 801 transition matrix.

rejuvyesh · 2023-03-01T21:31:48Z

If I understand the code correctly T[sp, s]*Zᵀ[o, sp] is the actual matrix multiplication? And you are still looping through things. I think unless we rewrite those loops somehow so that the result is from a few BLAS calls, there won't be any benefit of this.

WhiffleFish · 2023-03-01T21:39:42Z

Right, the line you're referencing is for initially populating the linear mapping matrix ($\beta$) before any planning begins. The actual alpha vector backup using this sparse matrix is done here:

NativeSARSOP.jl/src/backup.jl

Lines 14 to 19 in 6baf19a

    
           @inline function backup_a!(α, pomdp, a, β::AbstractArray{<:Number}, Γv) 
        
               R = @view pomdp.R[:,a] 
        
               γ = discount(pomdp) 
        
               mul!(α, β, Γv) 
        
               return @. α = R + γ*α 
        
           end

.

WhiffleFish added 2 commits February 25, 2023 12:25

matmul alpha backup

6baf19a

matmul backup tests

dc5f766

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matmul backup #6

Matmul backup #6

WhiffleFish commented Feb 25, 2023

codecov bot commented Feb 25, 2023 •

edited

Loading

zsunberg commented Feb 28, 2023

WhiffleFish commented Feb 28, 2023

rejuvyesh commented Feb 28, 2023

WhiffleFish commented Mar 1, 2023

rejuvyesh commented Mar 1, 2023

WhiffleFish commented Mar 1, 2023

Matmul backup #6

Are you sure you want to change the base?

Matmul backup #6

Conversation

WhiffleFish commented Feb 25, 2023

Before

Tiger

Rocksample(5,5)

Rocksample(10,10)

After

Tiger

Rocksample(5,5)

Rocksample(10,10)

codecov bot commented Feb 25, 2023 • edited Loading

Codecov Report

zsunberg commented Feb 28, 2023

WhiffleFish commented Feb 28, 2023

rejuvyesh commented Feb 28, 2023

WhiffleFish commented Mar 1, 2023

rejuvyesh commented Mar 1, 2023

WhiffleFish commented Mar 1, 2023

codecov bot commented Feb 25, 2023 •

edited

Loading