Skip to content

Commit

Permalink
address reproducibility
Browse files Browse the repository at this point in the history
  • Loading branch information
Tianshu Chu committed Sep 25, 2019
1 parent fc4d273 commit f95e0db
Show file tree
Hide file tree
Showing 14 changed files with 20 additions and 13 deletions.
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ First define all hyperparameters in a config file under `[config_dir]`, and crea

1. To train a new agent, run
~~~
python3 main.py --base-dir [base_dir] train --config-dir [config_dir] --test-mode no_test
python3 main.py --base-dir [base_dir]/[agent] train --config-dir [config_dir] --test-mode no_test
~~~
`no_test` is suggested, since tests will significantly slow down the training speed.
`[agent]` is from `{ia2c, ma2c, iqll, iqld}`. `no_test` is suggested, since tests will significantly slow down the training speed.

2. To access tensorboard during training, run
~~~
Expand All @@ -43,19 +43,26 @@ tensorboard --logdir=[base_dir]/log

3. To evaluate and compare trained agents, run
~~~
python3 main.py --base-dir [base_dir] evaluate --agents [agent names] --evaluate-seeds [seeds]
python3 main.py --base-dir [base_dir] evaluate --agents [agents] --evaluate-seeds [seeds]
~~~
Evaluation data will be output to `[base_dir]/eva_data`, and make sure evaluation seeds are different from those used in training.

4. To visualize the agent behavior, run
~~~
python3 main.py --base-dir [base_dir] evaluate --agents [agent name] --evaluate-seeds [seed] --demo
python3 main.py --base-dir [base_dir] evaluate --agents [agent] --evaluate-seeds [seed] --demo
~~~
It is recommended to have only one agent and one evaluation seed for the demo run. This will launch the SUMO GUI, and `./large_grid/data/view.xml` can be applied to visualize queue length and intersectin delay in edge color and thickness. Below are a few example screenshots.

| t=1500s | t=2500s | t=3500s
:-------------------:|:--------------------:|:--------------------:
![](./demo/1500.png) | ![](./demo/2500.png) | ![](./demo/3500.png)
![](./figs/1500.png) | ![](./figs/2500.png) | ![](./figs/3500.png)

## Reproducibility
Due to SUMO version change and a few corresponding code modifications (e.g. `tau="0.5"` has to be removed from `vType` to prevent extensive vehicle collisions in simulation), it becomes difficult to reproduce paper results. So we have re-run the experiments using the latest master and SUMO 1.1.0 and provided the following training plots as reference. The conclusion still remains the same, that is, MA2C ~ IQL-LR > IA2C in large grid and MA2C > IA2C > IQL-LR in Monaco net. Note rather than reproducing exactly the same results, an evaluation is always valid as far as the comparison is fair, that is, fixing env config and seed across agents.

| large grid | Monaco net
:-------------------------------:|:------------------------------:
![](./figs/large_grid_train.png) | ![](./figs/real_net_train.png)

## Citation
If you find this useful in your research, please cite our paper "Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control" ([early access version](https://ieeexplore.ieee.org/document/8667868), [preprint version](https://arxiv.org/pdf/1903.04527.pdf)):
Expand Down
2 changes: 1 addition & 1 deletion config/config_ia2c_large.ini
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = ia2c
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./large_grid/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_ia2c_real.ini
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = ia2c
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./real_net/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_iqld_large.ini
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = iqld
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./large_grid/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_iqld_real.ini
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = iqld
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./real_net/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_iqll_large.ini
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = iqll
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./large_grid/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_iqll_real.ini
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = iqll
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./real_net/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_ma2c_large.ini
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = ma2c
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./large_grid/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
2 changes: 1 addition & 1 deletion config/config_ma2c_real.ini
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ control_interval_sec = 5
; agent is greedy, iqll, iqld, ia2c, ma2c, a2c.
agent = ma2c
; coop discount is used to discount the neighbors' impact
coop_gamma = 0.75
coop_gamma = 0.9
data_path = ./real_net/data/
episode_length_sec = 3600
; the normailization is based on typical values in sim
Expand Down
Binary file added figs/1500.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/2500.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/3500.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/large_grid_train.pdf
Binary file not shown.
Binary file added figs/real_net_train.pdf
Binary file not shown.

0 comments on commit f95e0db

Please sign in to comment.