Policy evaluation

Goal

We would like to compare different policies performance.

Experiment: English description of what we're trying to do with a given policy
Seed: each experiment is run multiple times such that we can capture the variability of the results
Checkpoint: policies are saved at different points in time, and their performance change noticeably across saved instances
Episode: each evaluation consists of computing the success rate across 560 episodes (I-80 test set)

3 experiments:

Stochastic policy -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-gauss-model=vae-zdropout=0.5-policy-gauss-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
Deterministic policy, regressed cost -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
Non-regressed cost -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v13/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=False