Name	Name	Last commit message	Last commit date
Latest commit callumtilbury Fixed typo in which policy is used. Feb 24, 2023 7d6be21 · Feb 24, 2023 History 183 Commits
.vscode	.vscode	Removed unused arg.	Jul 15, 2022
config	config	Reduced RWARE default timesteps to 7.5M, and set online logging.	Aug 10, 2022
saved_agents	saved_agents	Renamed empty file to be invisible.	Jul 26, 2022
.gitignore	.gitignore	Don't store saved agents in repo.	Jul 29, 2022
README.md	README.md	Minor punctuation update	Feb 22, 2023
agent.py	agent.py	Fixed typo in which policy is used.	Feb 24, 2023
buffer.py	buffer.py	Trying out PyTorch instead... early signs of success :(	Jun 30, 2022
compute_time.py	compute_time.py	Minor update to csv saving.	Aug 16, 2022
env_wrapper.py	env_wrapper.py	Fixed bug in env registration.	Jul 11, 2022
gradient_estimators.py	gradient_estimators.py	Make the default gap 1.0 for GST.	Aug 11, 2022
maddpg.py	maddpg.py	Pass soft_update_size as config param.	Jul 26, 2022
main.py	main.py	Ability to log gradient variance during training.	Aug 16, 2022
networks.py	networks.py	Removed unused imports.	Jul 1, 2022
requirements.txt	requirements.txt	MPE install was trying ssh to github, which fails on other machines.	Jul 1, 2022
utils.py	utils.py	Implemented reward standardisation.	Jul 19, 2022

Repository files navigation

Revisiting the Gumbel-Softmax in MADDPG

Exploration of alternative gradient estimation techniques in MADDPG.

Hyperparameters

Hyperparameters used for the core MADDPG algorithm, mostly taken verbatim from Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks by Papoudakis et al. (2021):

	LBF	RWARE
network type	MLP	MLP
hidden dimensions	(64,64)	(64,64)
learning rate	3e-4	3e-4
reward standardisation	True	True
policy regulariser	0.001	0.001
target update $β$	0.01	0.01
max timesteps	25	500
training interval (steps)	25	50

Hyperparameter details for the various gradient estimation techniques, with the chosen parameters listed for the two environments:

Estimator:	Range Explored	LBF	RWARE
STGS-1	$τ = 1.0$	$1.0$	$1.0$
STGS-T	$τ \in (0, 1)$	$0.5$	$0.6$
TAGS	$τ \in [1, 5] \to [0.1, 0.5]$	$4.0 \to 0.1$	$1.0 \to 0.3$
GRMCK	$τ \in (0, 1]; K = 5, 10, 50$	$0.5; 10$	$0.7; 5$
GST	$τ \in (0, 1]$	$0.7$	$0.7$

Code Usage

python main.py [-h] [--config_file CONFIG_FILE] [--env ENV] [--seed SEED] [--warmup_episodes WARMUP_EPISODES] [--replay_buffer_size REPLAY_BUFFER_SIZE]
               [--total_steps TOTAL_STEPS] [--max_episode_length MAX_EPISODE_LENGTH] [--train_repeats TRAIN_REPEATS] [--batch_size BATCH_SIZE]
               [--hidden_dim_width HIDDEN_DIM_WIDTH] [--critic_lr CRITIC_LR] [--actor_lr ACTOR_LR] [--gradient_clip GRADIENT_CLIP] [--gamma GAMMA]
               [--soft_update_size SOFT_UPDATE_SIZE] [--policy_regulariser POLICY_REGULARISER] [--reward_per_agent] [--standardise_rewards] [--eval_freq EVAL_FREQ]
               [--eval_iterations EVAL_ITERATIONS] [--gradient_estimator {stgs,grmck,gst,tags}] [--gumbel_temp GUMBEL_TEMP] [--rao_k RAO_K] [--gst_gap GST_GAP]
               [--tags_start TAGS_START] [--tags_end TAGS_END] [--tags_period TAGS_PERIOD] [--save_agents] [--pretrained_agents PRETRAINED_AGENTS] [--just_demo_agents]
               [--render] [--disable_training] [--wandb_project_name WANDB_PROJECT_NAME] [--disable_wandb] [--offline_wandb] [--log_grad_variance]
               [--log_grad_variance_interval LOG_GRAD_VARIANCE_INTERVAL]

options:
  -h, --help            show this help message and exit
  --config_file CONFIG_FILE
  --env ENV
  --seed SEED
  --warmup_episodes WARMUP_EPISODES
  --replay_buffer_size REPLAY_BUFFER_SIZE
  --total_steps TOTAL_STEPS
  --max_episode_length MAX_EPISODE_LENGTH
  --train_repeats TRAIN_REPEATS
  --batch_size BATCH_SIZE
  --hidden_dim_width HIDDEN_DIM_WIDTH
  --critic_lr CRITIC_LR
  --actor_lr ACTOR_LR
  --gradient_clip GRADIENT_CLIP
  --gamma GAMMA
  --soft_update_size SOFT_UPDATE_SIZE
  --policy_regulariser POLICY_REGULARISER
  --reward_per_agent
  --standardise_rewards
  --eval_freq EVAL_FREQ
  --eval_iterations EVAL_ITERATIONS
  --gradient_estimator {stgs,grmck,gst,tags}
  --gumbel_temp GUMBEL_TEMP
  --rao_k RAO_K
  --gst_gap GST_GAP
  --tags_start TAGS_START
  --tags_end TAGS_END
  --tags_period TAGS_PERIOD
  --save_agents
  --pretrained_agents PRETRAINED_AGENTS
  --just_demo_agents
  --render
  --disable_training
  --wandb_project_name WANDB_PROJECT_NAME
  --disable_wandb
  --offline_wandb
  --log_grad_variance
  --log_grad_variance_interval LOG_GRAD_VARIANCE_INTERVAL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting the Gumbel-Softmax in MADDPG

Hyperparameters

Code Usage

About

Languages

uoe-agents/revisiting-maddpg

Folders and files

Latest commit

History

Repository files navigation

Revisiting the Gumbel-Softmax in MADDPG

Hyperparameters

Code Usage

About

Resources

Stars

Watchers

Forks

Languages