Skip to content

Release v0.91

Compare
Choose a tag to compare
@takuseno takuseno released this 25 Jul 10:08
· 593 commits to master since this release

Algorithm

RewardScaler

From this version, the preprocessors are available for the rewards, which allow you to normalize, standardize and clip the reward values.

import d3rlpy

# normalize
cql = d3rlpy.algos.CQL(reward_scaler="min_max")

# standardize
cql = d3rlpy.algos.CQL(reward_scaler="standardize")

# clip (you can't use string alias)
cql = d3rlpy.algos.CQL(reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0))

copy_policy_from and copy_q_function_from methods

In the scenario of finetuning, you might want to initialize SAC's policy function with the pretrained CQL's policy function to boost the initial performance. From this version, you can do that as follows:

import d3rlpy

# pretrain with static dataset
cql = d3rlpy.algos.CQL()
cql.fit(...)

# transfer the policy function
sac = d3rlpy.algos.SAC()
sac.copy_policy_from(cql)

# you can also transfer the Q-function
sac.copy_q_function_from(cql)

# finetuning with online algorithm
sac.fit_online(...)

Enhancements

  • show messages for skipping model builds
  • add alpha parameter option to DiscreteCQL
  • keep counting the number of gradient steps
  • allow expanding MDPDataset with the larger discrete actions (thanks, @jamartinh )
  • callback function is called every gradient step (previously, it's called every epoch)

Bugfix

  • FQE's loss function has been fixed (thanks for the report, @guyk1971)
  • fix documentation build (thanks, @astrojuanlu)
  • fix d4rl dataset conversion for MDPDataset (this will have a significant impact on the performance for d4rl dataset)