Release v0.91
Algorithm
RewardScaler
From this version, the preprocessors are available for the rewards, which allow you to normalize, standardize and clip the reward values.
import d3rlpy
# normalize
cql = d3rlpy.algos.CQL(reward_scaler="min_max")
# standardize
cql = d3rlpy.algos.CQL(reward_scaler="standardize")
# clip (you can't use string alias)
cql = d3rlpy.algos.CQL(reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0))
copy_policy_from and copy_q_function_from methods
In the scenario of finetuning, you might want to initialize SAC's policy function with the pretrained CQL's policy function to boost the initial performance. From this version, you can do that as follows:
import d3rlpy
# pretrain with static dataset
cql = d3rlpy.algos.CQL()
cql.fit(...)
# transfer the policy function
sac = d3rlpy.algos.SAC()
sac.copy_policy_from(cql)
# you can also transfer the Q-function
sac.copy_q_function_from(cql)
# finetuning with online algorithm
sac.fit_online(...)
Enhancements
- show messages for skipping model builds
- add
alpha
parameter option toDiscreteCQL
- keep counting the number of gradient steps
- allow expanding MDPDataset with the larger discrete actions (thanks, @jamartinh )
callback
function is called every gradient step (previously, it's called every epoch)
Bugfix
- FQE's loss function has been fixed (thanks for the report, @guyk1971)
- fix documentation build (thanks, @astrojuanlu)
- fix d4rl dataset conversion for MDPDataset (this will have a significant impact on the performance for d4rl dataset)