(MWE) Changing RockSample
actions from Int
to Action
leads to incredibly different policies
#489
-
I started adapting Due to the magnitude of the changes, I've forked The script I've been using ( rs_pomdp = RockSamplePOMDP()
rs_solver = NativeSARSOP.SARSOPSolver()
rs_solver = SARSOP.SARSOPSolver()
rs_policy = solve(rs_solver, rs_pomdp)
rs_b0 = initialize_belief(updater(rs_policy), initialstate(rs_pomdp))
rs_b1 = initialstate(rs_pomdp)
rs_b1.probs[1] = 0.5
rs_b1.probs[2:end] .= 0.5 / (length(rs_b1) - 1)
rs_b1 = initialize_belief(updater(rs_policy), rs_b1)
@show actionvalues(rs_policy, rs_b0)
@show actionvalues(rs_policy, rs_b1) Is there a known reason why this might happen? (This appears to be solver independent.) ("This" is: moving the action type from Outputs: # Using NativeSARSOP.SARSOPSolver on RockSample{RSState, Action, Int}
actionvalues(rs_policy, rs_b0) = [-Inf, -Inf, 8.1450625, -Inf, -Inf, -Inf, -Inf, -Inf]
actionvalues(rs_policy, rs_b1) = [-Inf, -Inf, 8.1450625, -Inf, -Inf, -Inf, -Inf, -Inf]
# Using SARSOP.SARSOPSolver on RockSample{RSState, Action, Int}
actionvalues(rs_policy, rs_b0) = [0.0, 112.12462249999999, 112.12462249999999, -Inf, 1.4968412100000001, 148.25917125, 3.869019999999999, 5.6147487499999995]
actionvalues(rs_policy, rs_b1) = [-85.7142857142857, 66.77228, 66.77228, -Inf, 3.3346878342857145, 87.28554214285714, 1.5705742857142866, 6.497632142857142]
# Using NativeSARSOP.SARSOPSolver on RockSample{RSState, Int, Int}
actionvalues(rs_policy, rs_b0) = [12.488632180920355, 13.145928611495112, 13.10140214786194, 16.034742813047465, 16.034742813047465, 16.926416376397345, 7.5443641406249995, 7.5443641406249995]
actionvalues(rs_policy, rs_b1) = [5.501226292362421, 10.302042713764955, 10.255118671320767, 11.600424860069094, 11.600424860069094, 12.322817261206417, 7.507160730271352, 7.500150138058002]
# Using SARSOP.SARSOPSolver on RockSample{RSState, Int, Int}
actionvalues(rs_policy, rs_b0) = [12.4886425, 13.145922500000001, 10.76643, 16.03475, 16.03475, 16.926436250000002, 11.871007500000001, 12.59692]
actionvalues(rs_policy, rs_b1) = [5.501234285714284, 10.302037142857142, 9.327171428571429, 11.600428571428571, 11.600428571428571, 12.322830714285713, 9.909788571428573, 10.446048571428571] |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
It looks like you forgot to post the output? |
Beta Was this translation helpful? Give feedback.
-
Turns out, I mistranslated one of the transition steps! No erroneous behavior now that it's sorted. 😅 |
Beta Was this translation helpful? Give feedback.
-
Great to see you see you were able to solve the issue! I'd like to add a general remark regarding the problem setup: by making each action a different type you will introduce type instability in various places. This may slow the solver considerably because these actions will be used in tight inner loops of the algorithm. It's difficult to say how pronounced this effect is for the solver you are using but it would be worth benchmarking and inspecting type stability with |
Beta Was this translation helpful? Give feedback.
Turns out, I mistranslated one of the transition steps! No erroneous behavior now that it's sorted. 😅