Release v0.70
Command Line Interface
New commands are added in this version.
record
You can record the video of the evaluation episodes without coding anything.
$ d3rlpy record d3rlpy_logs/CQL_20201224224314/model_100.pt --env-id HopperBulletEnv-v0
# record wrapped environment
$ d3rlpy record d3rlpy_logs/Discrete_CQL_20201224224314/model_100.pt \
--env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'
play
You can run the evaluation episodes with rendering images.
# record simple environment
$ d3rlpy play d3rlpy_logs/CQL_20201224224314/model_100.pt --env-id HopperBulletEnv-v0
# record wrapped environment
$ d3rlpy play d3rlpy_logs/Discrete_CQL_20201224224314/model_100.pt \
--env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'
data-point mask for bootstrapping
Ensemble training for Q-functions has been shown as a powerful method to achieve robust training. Previously, bootstrap
option has been available for algorithms. But, the mask for Q-function loss is randomly created every time when the batch is sampled.
In this version, create_mask
option is available for MDPDataset
and ReplayBuffer
, which will create a unique mask at each data-point.
# offline training
dataset = d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, create_mask=True, mask_size=5)
cql = d3rlpy.algos.CQL(n_critics=5, bootstrap=True, target_reduction_type='none')
cql.fit(dataset)
# online training
buffer = d3rlpy.online.buffers.ReplayBuffer(1000000, create_mask=True, mask_size=5)
sac = d3rlpy.algos.SAC(n_critics=5, bootstrap=True, target_reduction_type='none')
sac.fit_online(env, buffer)
As you noticed above, target_reduction_type
is newly introduced to specify how to aggregate target Q values. In the standard Soft Actor-Critic, the target_reduction_type='min'
. If you choose none
, each ensemble Q-function uses its own target value, which is similar to what Bootstrapped DQN does.
better module access
From this version, you can navigate to all modules through d3rlpy
.
# previously
from d3rlpy.datasets import get_cartpole
dataset = get_cartpole()
# v0.70
import d3rlpy
dataset = d3rlpy.datasets.get_cartpole()
new logger style
From this version, structlog
is internally used to print information instead of raw print
function. This allows us to emit more structural information. Furthermore, you can control what to show and what to save to the file if you overwrite logger configuration.
enhancements
soft_q_backup
option is added toCQL
.Paper Reproduction
page has been added to the documentation in order to show the performance with the paper configuration.commit
method atD3RLPyLogger
returns metrics (thanks, @jamartinh )
bugfix
- fix
epoch
count in offline training. - fix
total_step
count in online training. - fix typos at documentation (thanks, @pstansell )