Release Release v0.91 · takuseno/d3rlpy

Algorithm

TD3+BC
- https://arxiv.org/abs/2106.06860

RewardScaler

From this version, the preprocessors are available for the rewards, which allow you to normalize, standardize and clip the reward values.

import d3rlpy

# normalize
cql = d3rlpy.algos.CQL(reward_scaler="min_max")

# standardize
cql = d3rlpy.algos.CQL(reward_scaler="standardize")

# clip (you can't use string alias)
cql = d3rlpy.algos.CQL(reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0))

copy_policy_from and copy_q_function_from methods

In the scenario of finetuning, you might want to initialize SAC's policy function with the pretrained CQL's policy function to boost the initial performance. From this version, you can do that as follows:

import d3rlpy

# pretrain with static dataset
cql = d3rlpy.algos.CQL()
cql.fit(...)

# transfer the policy function
sac = d3rlpy.algos.SAC()
sac.copy_policy_from(cql)

# you can also transfer the Q-function
sac.copy_q_function_from(cql)

# finetuning with online algorithm
sac.fit_online(...)

Enhancements

show messages for skipping model builds
add alpha parameter option to DiscreteCQL
keep counting the number of gradient steps
allow expanding MDPDataset with the larger discrete actions (thanks, @jamartinh )
callback function is called every gradient step (previously, it's called every epoch)

Bugfix

FQE's loss function has been fixed (thanks for the report, @guyk1971)
fix documentation build (thanks, @astrojuanlu)
fix d4rl dataset conversion for MDPDataset (this will have a significant impact on the performance for d4rl dataset)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.91

Algorithm

RewardScaler

copy_policy_from and copy_q_function_from methods

Enhancements

Bugfix