(MWE) Changing `RockSample` actions from `Int` to `Action` leads to incredibly different policies #489

jmuchovej · 2023-05-11T17:44:48Z

jmuchovej
May 11, 2023

I started adapting RockSample's implementation to mirror the setup for a problem I've been working on. Namely, adapting RockSample's use of Int-based actions to Action{type}, allowing for dispatch on Action{type} instead of control flow to check what kind of action it is. (And more intuitive action names – mostly a dev-ex thing, rather than core to the research. 😅)

Due to the magnitude of the changes, I've forked RockSample and committed them.

The script I've been using (mwe.jl) is in that repository as well, but the meat-and-potatoes of it is:

rs_pomdp = RockSamplePOMDP()
rs_solver = NativeSARSOP.SARSOPSolver()
rs_solver = SARSOP.SARSOPSolver()
rs_policy = solve(rs_solver, rs_pomdp)

rs_b0 = initialize_belief(updater(rs_policy), initialstate(rs_pomdp))
rs_b1 = initialstate(rs_pomdp)
rs_b1.probs[1] = 0.5
rs_b1.probs[2:end] .= 0.5 / (length(rs_b1) - 1)
rs_b1 = initialize_belief(updater(rs_policy), rs_b1)

@show actionvalues(rs_policy, rs_b0)
@show actionvalues(rs_policy, rs_b1)

Is there a known reason why this might happen? (This appears to be solver independent.) ("This" is: moving the action type from Int (<: Number) to Action appears to generate wildly different policies.)

Outputs:

# Using NativeSARSOP.SARSOPSolver on RockSample{RSState, Action, Int}
actionvalues(rs_policy, rs_b0) = [-Inf, -Inf, 8.1450625, -Inf, -Inf, -Inf, -Inf, -Inf]
actionvalues(rs_policy, rs_b1) = [-Inf, -Inf, 8.1450625, -Inf, -Inf, -Inf, -Inf, -Inf]

# Using SARSOP.SARSOPSolver on RockSample{RSState, Action, Int}
actionvalues(rs_policy, rs_b0) = [0.0, 112.12462249999999, 112.12462249999999, -Inf, 1.4968412100000001, 148.25917125, 3.869019999999999, 5.6147487499999995]
actionvalues(rs_policy, rs_b1) = [-85.7142857142857, 66.77228, 66.77228, -Inf, 3.3346878342857145, 87.28554214285714, 1.5705742857142866, 6.497632142857142]

# Using NativeSARSOP.SARSOPSolver on RockSample{RSState, Int, Int}
actionvalues(rs_policy, rs_b0) = [12.488632180920355, 13.145928611495112, 13.10140214786194, 16.034742813047465, 16.034742813047465, 16.926416376397345, 7.5443641406249995, 7.5443641406249995]
actionvalues(rs_policy, rs_b1) = [5.501226292362421, 10.302042713764955, 10.255118671320767, 11.600424860069094, 11.600424860069094, 12.322817261206417, 7.507160730271352, 7.500150138058002]

# Using SARSOP.SARSOPSolver on RockSample{RSState, Int, Int}
actionvalues(rs_policy, rs_b0) = [12.4886425, 13.145922500000001, 10.76643, 16.03475, 16.03475, 16.926436250000002, 11.871007500000001, 12.59692]
actionvalues(rs_policy, rs_b1) = [5.501234285714284, 10.302037142857142, 9.327171428571429, 11.600428571428571, 11.600428571428571, 12.322830714285713, 9.909788571428573, 10.446048571428571]

Answered by jmuchovej

May 11, 2023

Turns out, I mistranslated one of the transition steps! No erroneous behavior now that it's sorted. 😅

View full answer

lassepe · 2023-05-11T18:04:33Z

lassepe
May 11, 2023
Maintainer

It looks like you forgot to post the output?

2 replies

jmuchovej May 11, 2023
Author

I wasn't sure if I should post it, since the results don't immediately make plain what the problems are (I think), but I've edited the post to have the output.

jmuchovej May 11, 2023
Author

I also realize there was a typo in my snippet of the mwe.jl script. 😅

jmuchovej · 2023-05-11T21:53:42Z

jmuchovej
May 11, 2023
Author

Turns out, I mistranslated one of the transition steps! No erroneous behavior now that it's sorted. 😅

0 replies

lassepe · 2023-05-12T05:41:26Z

lassepe
May 12, 2023
Maintainer

Great to see you see you were able to solve the issue!

I'd like to add a general remark regarding the problem setup: by making each action a different type you will introduce type instability in various places. This may slow the solver considerably because these actions will be used in tight inner loops of the algorithm. It's difficult to say how pronounced this effect is for the solver you are using but it would be worth benchmarking and inspecting type stability with @code_warntype or Cthulhu.jl

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(MWE) Changing `RockSample` actions from `Int` to `Action` leads to incredibly different policies #489

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

(MWE) Changing RockSample actions from Int to Action leads to incredibly different policies #489

jmuchovej May 11, 2023

Replies: 3 comments · 2 replies

lassepe May 11, 2023 Maintainer

jmuchovej May 11, 2023 Author

jmuchovej May 11, 2023 Author

jmuchovej May 11, 2023 Author

lassepe May 12, 2023 Maintainer

(MWE) Changing `RockSample` actions from `Int` to `Action` leads to incredibly different policies #489

jmuchovej
May 11, 2023

Replies: 3 comments 2 replies

lassepe
May 11, 2023
Maintainer

jmuchovej May 11, 2023
Author

jmuchovej May 11, 2023
Author

jmuchovej
May 11, 2023
Author

lassepe
May 12, 2023
Maintainer