Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matmul backup #6

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Matmul backup #6

wants to merge 2 commits into from

Conversation

WhiffleFish
Copy link
Member

Reduce alpha vector backup to a single matrix multiplication.

Current performance is quite bad relative to main branch, but I think the idea may still be worth pursuing.

Before

Tiger

BenchmarkTools.Trial: 558 samples with 1 evaluation.
 Range (min  max):  8.515 ms   15.210 ms  ┊ GC (min  max): 0.00%  41.33%
 Time  (median):     8.844 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.966 ms ± 833.336 μs  ┊ GC (mean ± σ):  1.29% ±  5.52%

    ▇█                                                         
  ▃▆██▇▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▂▂ ▂
  8.52 ms         Histogram: frequency by time        14.7 ms <

 Memory estimate: 2.00 MiB, allocs estimate: 13031.

Rocksample(5,5)

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range (min  max):  201.661 ms  209.820 ms  ┊ GC (min  max): 0.00%  2.03%
 Time  (median):     205.455 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   205.419 ms ±   2.128 ms  ┊ GC (mean ± σ):  0.25% ± 0.69%

  ▁      ▁  ▁█▁▁▁    ▁ ▁  ▁  ▁▁   ███ ▁           █▁   ▁      ▁  
  █▁▁▁▁▁▁█▁▁█████▁▁▁▁█▁█▁▁█▁▁██▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁▁██▁▁▁█▁▁▁▁▁▁█ ▁
  202 ms           Histogram: frequency by time          210 ms <

 Memory estimate: 14.99 MiB, allocs estimate: 69304.

Rocksample(10,10)

BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min  max):  293.601 ms  303.135 ms  ┊ GC (min  max): 0.00%  1.79%
 Time  (median):     297.885 ms               ┊ GC (median):    0.71%
 Time  (mean ± σ):   298.023 ms ±   3.036 ms  ┊ GC (mean ± σ):  0.85% ± 0.90%

  █ █  █     █ ██    █     █ █ ██       █       █ █    ██     █  
  █▁█▁▁█▁▁▁▁▁█▁██▁▁▁▁█▁▁▁▁▁█▁█▁██▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁█▁▁▁▁██▁▁▁▁▁█ ▁
  294 ms           Histogram: frequency by time          303 ms <

 Memory estimate: 64.93 MiB, allocs estimate: 82768.

After

Tiger

BenchmarkTools.Trial: 556 samples with 1 evaluation.
 Range (min  max):  8.625 ms   16.160 ms  ┊ GC (min  max): 0.00%  43.59%
 Time  (median):     8.874 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.002 ms ± 880.114 μs  ┊ GC (mean ± σ):  1.21% ±  5.39%

   █▅▁                                                         
  ▇███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▂ ▂
  8.62 ms         Histogram: frequency by time        15.7 ms <

 Memory estimate: 2.00 MiB, allocs estimate: 13043.

Rocksample(5,5)

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range (min  max):  200.425 ms  208.799 ms  ┊ GC (min  max): 0.00%  2.23%
 Time  (median):     203.231 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   203.680 ms ±   2.312 ms  ┊ GC (mean ± σ):  0.34% ± 0.78%

                  █            ▃   █                          ▃  
  ▇▁▇▇▇▁▁▁▇▁▇▁▇▇▁▁█▇▁▁▇▇▁▁▁▇▇▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁█ ▁
  200 ms           Histogram: frequency by time          209 ms <

 Memory estimate: 22.43 MiB, allocs estimate: 69343.

Rocksample(10,10)

BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min  max):  298.811 ms  305.206 ms  ┊ GC (min  max): 0.71%  1.33%
 Time  (median):     302.556 ms               ┊ GC (median):    1.37%
 Time  (mean ± σ):   302.161 ms ±   1.666 ms  ┊ GC (mean ± σ):  1.45% ± 0.33%

                    ▃                █        ▃                  
  ▇▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁█▁▁▇▁▇▁▁▁▇▁▁▁▁▁▁▁█▇▇▁▇▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▇ ▁
  299 ms           Histogram: frequency by time          305 ms <

 Memory estimate: 182.57 MiB, allocs estimate: 82816.

@codecov
Copy link

codecov bot commented Feb 25, 2023

Codecov Report

Base: 99.27% // Head: 99.30% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (b1486f3) compared to base (29ed7e9).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b1486f3 differs from pull request most recent head dc5f766. Consider uploading reports for the commit dc5f766 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main       #6      +/-   ##
==========================================
+ Coverage   99.27%   99.30%   +0.02%     
==========================================
  Files          12       12              
  Lines         690      716      +26     
==========================================
+ Hits          685      711      +26     
  Misses          5        5              
Impacted Files Coverage Δ
src/tree.jl 100.00% <ø> (ø)
src/backup.jl 100.00% <100.00%> (ø)
src/cache.jl 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zsunberg
Copy link
Member

Seems like there are a lot of allocs. Can you do it more in-place somehow?

@WhiffleFish
Copy link
Member Author

A vast majority of allocations occur when generating new beliefs and storing these beliefs in the tree. While this could be improved by either preallocating beliefs at the very beginning or using something like a bump allocator around the belief updater, I'm not entirely sure it's worth the effort.

Currently, allocations have an extremely negligible effect on performance. In profiles, elementary non-allocating array operations like getindex and setindex! take up a vast majority of computation time.

As far as I can tell, the best ways of improving performance currently would be through #2, #3, and #7.

That being said, the current implementation is already quite fast, so these improvements aren't absolutely necessary.

@rejuvyesh
Copy link
Member

What Matrix sizes are you evaluating on? It matters more once you have greater than 50x50 dimensions in your matrix.

@WhiffleFish
Copy link
Member Author

RockSample(10,10) has 801 states, so there we're working with an 801 x 801 transition matrix.

@rejuvyesh
Copy link
Member

If I understand the code correctly T[sp, s]*Zᵀ[o, sp] is the actual matrix multiplication? And you are still looping through things. I think unless we rewrite those loops somehow so that the result is from a few BLAS calls, there won't be any benefit of this.

@WhiffleFish
Copy link
Member Author

Right, the line you're referencing is for initially populating the linear mapping matrix ($\beta$) before any planning begins. The actual alpha vector backup using this sparse matrix is done here:

@inline function backup_a!(α, pomdp, a, β::AbstractArray{<:Number}, Γv)
R = @view pomdp.R[:,a]
γ = discount(pomdp)
mul!(α, β, Γv)
return @. α = R + γ*α
end

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants