You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
m: number of actions (Assume it is constant over the states)
Consider bellman_operator, where we compute
vals = R + beta * Q * v and
Tv = s_wise_max(vals).
Let me denote R + beta * Q * v by U(s, a) as a function of (s, a), to distinguish it from a Julia array; let me denote the probabilities by q(sa, s'), where sa is a state-action pair, while s' is a state tomorrow.
To compute s_wise_max, the data for U should be ordered in memory as
U(1, 1), ..., U(1, m), U(2, 1), ..., U(2, m), ..., U(n, 1), ..., U(n, m)
i.e., action should change faster.
So the same applies to R: action should change faster.
To compute Q * v, the data for q should be ordered in memory as
(This is to summarize my comments #109 (comment), #109 (comment), #109 (comment).)
n
: number of statesm
: number of actions (Assume it is constant over the states)Consider
bellman_operator
, where we computevals = R + beta * Q * v
andTv = s_wise_max(vals)
.Let me denote
R + beta * Q * v
by U(s, a) as a function of (s, a), to distinguish it from a Julia array; let me denote the probabilities by q(sa, s'), where sa is a state-action pair, while s' is a state tomorrow.To compute
s_wise_max
, the data for U should be ordered in memory asU(1, 1), ..., U(1, m), U(2, 1), ..., U(2, m), ..., U(n, 1), ..., U(n, m)
i.e., action should change faster.
So the same applies to
R
: action should change faster.To compute
Q * v
, the data for q should be ordered in memory asq(sa_1, 1), ..., q(sa_1, n), ..., q(sa_L, 1), ..., q(sa_L, n)
(where the state-action pairs are listed as sa_1, ..., sa_L)
i.e., state-tomorrow should change faster.
Given 1. and 2., for
Q
, state-tomorrow should change first, action second, and state-today third.The text was updated successfully, but these errors were encountered: