Code for the paper Reducing Exploitability with Population Based Training. We reduce exploitability by adversarial RL policies by training against a diverse population of opponents.
Should work with python 3.7, 3.8
Install using Docker or using the following process.
conda create -n defense python=3.8
Install necessary packages:
pip install -r requirements.txt
pip install -r requirements-dev.txt
For generating videos:
conda install ffmpeg
ffmpeg
can optionally also be installed with your systems package manager
To change the output path change TrialSettings.out_path
via gin-config.
This can be overwritten with the environment variable POLICY_DEFENSE_OUT
.
Most frequently used settings can be changed via gin.
The settings intended to be configured with gin are:
TrialSettings
(aprl_defense.trial.settings.TrialSettings
)RLSettings
(aprl_defense.trial.settings.RLSettings
)- Additionally, depending on whether one of these modes is used
selfplay
(aprl_defense.training_managers.simple_training_manager.SelfplayTrainingManager
)single-agent
- no additonal argumentsattack
(aprl_defense.training_managers.simple_training_manager.AttackManager
)pbt
(aprl_defense.training_managers.pbt_manager.PBTManager
)
For further documentation on the configurable parameters check the Documentation of the respective classes.
Experiments for the paper were run with the settings in src/gin/icml
.
To change hyperparameters we recommend creating RLlib configs that can be passed in via override
/ override_f
gin settings.
Many experiments were run using dedicated python scripts, located in src/experiments
.
The following examples should clarify how to specify training for different modes (run from src
folder).
python -m aprl_defense.train \
-f "gin/icml/selfplay/laser_tag.gin" \
-p "TrialSettings.num_workers = 10" \
-p "TrialSettings.wandb_group = 'experiment'"
python -m aprl_defense.train \
-f "gin/icml/attack/sp_laser_tag.gin" \
-p "TrialSettings.num_workers = 10" \
-p "TrialSettings.wandb_group = 'experiment'" \
-p "attack.victim_artifact = '<wandb artifact id>'" \
-p "attack.victim_policy_name = '<name of victim policy>' "
Attention: PBT only runs with the modified version of ray.
python -m aprl_defense.train \
-f "gin/icml/pbt/laser_tag.gin" \
-p "TrialSettings.num_workers = 10" \
-p "TrialSettings.wandb_group = 'experiment'" \
-p "pbt.main_id = 0" \
-p "pbt.num_ops = 50" \
-p "TrialSettings.num_workers = 50" '
In all but the most basic setups creating an RLlib config for multiagent training requires programmatically creating a config in python and these configs could not be created simply by passing in a config file. For convenience the most commonly changed hyperparameters and set-up configurations can be changed with gin, additional modifications can be performed by overriding the RLlib config.