You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exercise 2.4 If the step-size parameters, αn, are not constant, then the estimate Q n is
a weighted average of previously received rewards with a weighting different from that
given by (2.6). What is the weighting on each prior reward for the general case, analogous
to (2.6), in terms of the sequence of step-size parameters?
If you try to check the coefficient of Rn from your answer, it comes out to be αn*Π[1-αi](i=1 to n) whereas the actual answer is αn.
Hence the correct formulation should be the following :
Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i)[1-αj]*Ri]
i.e. iterate from j = i to n instead of j = 1 to i.
Please correct me if I am wrong. Thank you.
The text was updated successfully, but these errors were encountered:
Hi, Hector
I am referring to the second edition of the book.
Exercise 2.4 If the step-size parameters, αn, are not constant, then the estimate Q n is
a weighted average of previously received rewards with a weighting different from that
given by (2.6). What is the weighting on each prior reward for the general case, analogous
to (2.6), in terms of the sequence of step-size parameters?
If you try to check the coefficient of Rn from your answer, it comes out to be αn*Π[1-αi](i=1 to n) whereas the actual answer is αn.
Hence the correct formulation should be the following :
Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i)[1-αj]*Ri]
i.e. iterate from j = i to n instead of j = 1 to i.
Please correct me if I am wrong. Thank you.
The text was updated successfully, but these errors were encountered: