-
Notifications
You must be signed in to change notification settings - Fork 54
Policy evaluation
Alfredo Canziani edited this page Oct 17, 2019
·
6 revisions
We would like to compare different policies performance.
- Experiment: English description of what we're trying to do with a given policy
- Seed: each experiment is run multiple times such that we can capture the variability of the results
- Checkpoint: policies are saved at different points in time, and their performance change noticeably across saved instances
-
Episode: each evaluation consists of computing the success rate across
560
episodes (I-80 test set)
3 experiments:
- Stochastic policy ->
/misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-gauss-model=vae-zdropout=0.5-policy-gauss-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
- Deterministic policy, regressed cost ->
/misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
- Non-regressed cost ->
/misc/vlgscratch4/LecunGroup/nvidia-collab/models_v13/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=False
- Full screen single cell: https://github.com/scottlittle/expand-cell-fullscreen
-
bqplot
: https://github.com/bloomberg/bqplot