Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReBRAC finetune #9

Merged
merged 16 commits into from
Dec 6, 2023
Merged
93 changes: 46 additions & 47 deletions README.md

Large diffs are not rendered by default.

1,099 changes: 1,099 additions & 0 deletions algorithms/finetune/rebrac.py

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions configs/finetune/rebrac/antmaze/large_diverse_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.002
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.002
critic_learning_rate: 0.00005
critic_ln: true
critic_n_hiddens: 3
dataset_name: antmaze-large-diverse-v2
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.999
group: rebrac-finetune-antmaze-large-diverse-v2
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: true
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/antmaze/large_play_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.002
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.001
critic_learning_rate: 0.00005
critic_ln: true
critic_n_hiddens: 3
dataset_name: antmaze-large-play-v2
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.999
group: rebrac-finetune-antmaze-large-play-v2
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: true
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/antmaze/medium_diverse_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.001
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.0
critic_learning_rate: 0.00005
critic_ln: true
critic_n_hiddens: 3
dataset_name: antmaze-medium-diverse-v2
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.999
group: rebrac-finetune-antmaze-medium-diverse-v2
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: true
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/antmaze/medium_play_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.001
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.0005
critic_learning_rate: 0.00005
critic_ln: true
critic_n_hiddens: 3
dataset_name: antmaze-medium-play-v2
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.999
group: rebrac-finetune-antmaze-medium-play-v2
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: true
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/antmaze/umaze_diverse_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.003
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.001
critic_learning_rate: 0.00005
critic_ln: true
critic_n_hiddens: 3
dataset_name: antmaze-umaze-diverse-v2
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.999
group: rebrac-finetune-antmaze-umaze-diverse-v2
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: true
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
36 changes: 36 additions & 0 deletions configs/finetune/rebrac/antmaze/umaze_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
actor_bc_coef: 0.003
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.002
critic_learning_rate: 0.00005
critic_ln: true
critic_n_hiddens: 3
dataset_name: antmaze-umaze-v2
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.999
group: rebrac-finetune-antmaze-umaze-v2
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: true
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false

35 changes: 35 additions & 0 deletions configs/finetune/rebrac/door/cloned_v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.01
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.1
critic_learning_rate: 0.0003
critic_ln: true
critic_n_hiddens: 3
dataset_name: door-cloned-v1
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.99
group: rebrac-finetune-door-cloned-v1
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: false
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/hammer/cloned_v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.1
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.5
critic_learning_rate: 0.0003
critic_ln: true
critic_n_hiddens: 3
dataset_name: hammer-cloned-v1
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.99
group: rebrac-finetune-hammer-cloned-v1
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: false
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/pen/cloned_v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.05
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.5
critic_learning_rate: 0.0003
critic_ln: true
critic_n_hiddens: 3
dataset_name: pen-cloned-v1
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.99
group: rebrac-finetune-pen-cloned-v1
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: false
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
35 changes: 35 additions & 0 deletions configs/finetune/rebrac/relocate/cloned_v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
actor_bc_coef: 0.1
actor_learning_rate: 0.0003
actor_ln: false
actor_n_hiddens: 3
batch_size: 256
critic_bc_coef: 0.01
critic_learning_rate: 0.0003
critic_ln: true
critic_n_hiddens: 3
dataset_name: relocate-cloned-v1
eval_episodes: 100
eval_every: 50000
eval_seed: 42
expl_noise: 0.0
gamma: 0.99
group: rebrac-finetune-relocate-cloned-v1
hidden_dim: 256
min_decay_coef: 0.5
mixing_ratio: 0.5
name: rebrac-finetune
noise_clip: 0.5
normalize_q: true
normalize_reward: false
normalize_states: false
num_offline_updates: 1000000
num_online_updates: 1000000
num_warmup_steps: 0
policy_freq: 2
policy_noise: 0.2
project: CORL
replay_buffer_size: 2000000
reset_opts: false
tau: 0.005
train_seed: 0
use_calibration: false
Binary file modified results/bin/finetune_scores.pickle
Binary file not shown.
5 changes: 5 additions & 0 deletions results/get_finetune_scores.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,14 @@ def get_run_scores(run_id, is_dt=False):
break
for _, row in run.history(keys=[score_key], samples=5000).iterrows():
full_scores.append(row[score_key])

for _, row in run.history(keys=["train/regret"], samples=5000).iterrows():
if "train/regret" in row:
regret = row["train/regret"]
for _, row in run.history(keys=["eval/regret"], samples=5000).iterrows():
if "eval/regret" in row:
regret = row["eval/regret"]

offline_iters = len(full_scores) // 2
return full_scores[:offline_iters], full_scores[offline_iters:], regret

Expand Down
Loading