Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 545 Bytes

File metadata and controls

13 lines (10 loc) · 545 Bytes

Implement an exploring reinforcement learning agent that uses direct utility estimation. Make two versions—one with a tabular representation and one using the function approximator in Equation (4x3-linear-approx-equation). Compare their performance in three environments:

  1. The $4\times 3$ world described in the chapter.

  2. A ${10}\times {10}$ world with no obstacles and a +1 reward at (10,10).

  3. A ${10}\times {10}$ world with no obstacles and a +1 reward at (5,5).