Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Answer of exercise 2.4 is wrong #13

Open
AbhishekVarghese opened this issue Jun 25, 2020 · 1 comment
Open

Answer of exercise 2.4 is wrong #13

AbhishekVarghese opened this issue Jun 25, 2020 · 1 comment

Comments

@AbhishekVarghese
Copy link

AbhishekVarghese commented Jun 25, 2020

Hi, Hector

I am referring to the second edition of the book.

Exercise 2.4 If the step-size parameters, αn, are not constant, then the estimate Q n is
a weighted average of previously received rewards with a weighting different from that
given by (2.6). What is the weighting on each prior reward for the general case, analogous
to (2.6), in terms of the sequence of step-size parameters?

If you try to check the coefficient of Rn from your answer, it comes out to be αn*Π[1-αi](i=1 to n) whereas the actual answer is αn.

Hence the correct formulation should be the following :
Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i)[1-αj]*Ri]

i.e. iterate from j = i to n instead of j = 1 to i.

Please correct me if I am wrong. Thank you.

@niuwagege
Copy link

Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i+1)[1-αj]*Ri]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants