Answer of exercise 2.4 is wrong #13

AbhishekVarghese · 2020-06-25T06:07:24Z

Hi, Hector

I am referring to the second edition of the book.

Exercise 2.4 If the step-size parameters, αn, are not constant, then the estimate Q n is
a weighted average of previously received rewards with a weighting different from that
given by (2.6). What is the weighting on each prior reward for the general case, analogous
to (2.6), in terms of the sequence of step-size parameters?

If you try to check the coefficient of Rn from your answer, it comes out to be αn*Π[1-αi](i=1 to n) whereas the actual answer is αn.

Hence the correct formulation should be the following :
Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i)[1-αj]*Ri]

i.e. iterate from j = i to n instead of j = 1 to i.

Please correct me if I am wrong. Thank you.

niuwagege · 2020-07-28T13:07:25Z

Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i+1)[1-αj]*Ri]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Answer of exercise 2.4 is wrong #13

Answer of exercise 2.4 is wrong #13

AbhishekVarghese commented Jun 25, 2020 •

edited

Loading

niuwagege commented Jul 28, 2020

Answer of exercise 2.4 is wrong #13

Answer of exercise 2.4 is wrong #13

Comments

AbhishekVarghese commented Jun 25, 2020 • edited Loading

niuwagege commented Jul 28, 2020

AbhishekVarghese commented Jun 25, 2020 •

edited

Loading