Skip to content

Release v0.70

Compare
Choose a tag to compare
@takuseno takuseno released this 18 Feb 10:19
· 750 commits to master since this release

Command Line Interface

New commands are added in this version.

record

You can record the video of the evaluation episodes without coding anything.

$ d3rlpy record d3rlpy_logs/CQL_20201224224314/model_100.pt --env-id HopperBulletEnv-v0

# record wrapped environment
$ d3rlpy record d3rlpy_logs/Discrete_CQL_20201224224314/model_100.pt \
    --env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'

play

You can run the evaluation episodes with rendering images.

# record simple environment
$ d3rlpy play d3rlpy_logs/CQL_20201224224314/model_100.pt --env-id HopperBulletEnv-v0

# record wrapped environment
$ d3rlpy play d3rlpy_logs/Discrete_CQL_20201224224314/model_100.pt \
    --env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'

data-point mask for bootstrapping

Ensemble training for Q-functions has been shown as a powerful method to achieve robust training. Previously, bootstrap option has been available for algorithms. But, the mask for Q-function loss is randomly created every time when the batch is sampled.

In this version, create_mask option is available for MDPDataset and ReplayBuffer, which will create a unique mask at each data-point.

# offline training
dataset = d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, create_mask=True, mask_size=5)
cql = d3rlpy.algos.CQL(n_critics=5, bootstrap=True, target_reduction_type='none')
cql.fit(dataset)

# online training
buffer = d3rlpy.online.buffers.ReplayBuffer(1000000, create_mask=True, mask_size=5)
sac = d3rlpy.algos.SAC(n_critics=5, bootstrap=True, target_reduction_type='none')
sac.fit_online(env, buffer)

As you noticed above, target_reduction_type is newly introduced to specify how to aggregate target Q values. In the standard Soft Actor-Critic, the target_reduction_type='min'. If you choose none, each ensemble Q-function uses its own target value, which is similar to what Bootstrapped DQN does.

better module access

From this version, you can navigate to all modules through d3rlpy.

# previously
from d3rlpy.datasets import get_cartpole
dataset = get_cartpole()

# v0.70
import d3rlpy
dataset = d3rlpy.datasets.get_cartpole()

new logger style

From this version, structlog is internally used to print information instead of raw print function. This allows us to emit more structural information. Furthermore, you can control what to show and what to save to the file if you overwrite logger configuration.

image

enhancements

  • soft_q_backup option is added to CQL.
  • Paper Reproduction page has been added to the documentation in order to show the performance with the paper configuration.
  • commit method at D3RLPyLogger returns metrics (thanks, @jamartinh )

bugfix

  • fix epoch count in offline training.
  • fix total_step count in online training.
  • fix typos at documentation (thanks, @pstansell )