Hardware: Google Colab T4
Model Type | Discrete | Average Reward | Training Time | Total Training Steps |
---|---|---|---|---|
PPO | No | 887.84 | 5:33:03 | 751,614 |
SAC | No | 610.67 | 6:29:16 | 333,116 |
DQN | Yes | 897.77 | 5:41:22 | 750,000 |
- Set
ent_coef
for PPO as it encourages exploration of other actions. Stable Baselines3 defaults the value to 0.0. More Information - Do not set your
eval_freq
too low, as it can sometimes cause instability during learning due to being interrupted by evaluation. (e.g. >=10,000) buffer_size
defaults to 1,000,000, which requires a significant memory for DQN and SAC. Try setting it to a more practical value when using the original observation space (e.g., 200,000)- Set the
gray_scale
flag in the notebooks toTrue
to allow DQN and SAC to run without using the High-RAM option in Google Colab (buffer size <= 150,000). This converts the observation space from (96 x 96 x 3) images to (84 x 84) grayscale images.