TamaGo has a function to execute reinforcement learning like Gumbel AlphaZero style.
GNUGo is used to correct the results of self-play games during reinforcement learning runs. Since reinforcement learning proceeds without GNUGo, the installation of GNUGo is optional. However, I recommend the use of GNUGo because of TamaGo's win/loss decision at the end of the game is very messy.
To install GNUGo, simply execute the following command on Ubuntu.
apt install gnugo
Hyperparameters for reinforcement learning is defined in learning_param.py.
Hyperparameter | Description | Example of value | Note |
---|---|---|---|
RL_LEARNING_RATE | Learning rate for reinforcement learning. | 0.01 | 学習がある程度進んだ時に小さな値に変更すると良いです。 |
BATCH_SIZE | Mini-batch size for training. | 256 | GPUメモリが小さい場合はこの値を小さめに設定してください。 |
MOMENTUM | Momentum parameter for an optimizer. | 0.9 | |
WEIGHT_DECAY | Weight of L2-regularization. | 1e-4 (0.0001) | |
DATA_SET_SIZE | Number of data to be stored in a npz file. | BATCH_SIZE * 4000 | |
RL_VALUE_WEIGHT | Weight of value loss against policy loss. | 1.0 | This must be more than 0.0. |
SELF_PLAY_VISITS | The number of visits per move for self-play. | 16 | This must be more than 1. |
NUM_SELF_PLAY_WORKERS | The number of self-play workers. | 4 | |
NUM_SELF_PLAY_GAMES | The number of self-play games generated. | 10000 |
Since these hyperparameters are used to confirm that reinforcement learning progresses well, please use the set values as they are at first and gradually change the values to check the learning status when you try it out.
Neural network is defined using the following four files.
File | Definition |
---|---|
nn/network/dual_net.py | Neural network definition. |
nn/network/res_block.py | Residual block definition. |
nn/network/head/policy_head.py | Policy head definition. |
nn/network/head/value_head.py | Value head definition. |
If you try to change structure, I recommend you to change value of filters or blocks at first.
Reinforcement learning process runs in the following order.
- Using an existing neural network model and executing a specified number of self-play games.
- Adjusting the self-play games' results using GNUGo (optional).
- Executing neural network training using SGF files generated by self-play game process.
- Repeat from step-1 to step-3.
Reinforcement learning pipeline is defined in pipeline.sh.
Command line options for selfplay_main.py
Option | Description | Example of value | Default value | Note |
---|---|---|---|---|
--save-dir |
Directory path to save SGF files generated by self-play process. | save_dir | archive | |
--process |
The number of self-play workers. | 2 | NUM_SELF_PLAY_WORKERS | |
--num-data |
The number of self-play games generated. | 5000 | NUM_SELF_PLAY_GAMES | |
--size |
Go board size. | 9 | 9 | |
--use-gpu |
Flag to use a GPU. | true | true | Value is true of false. |
--visits |
The number of visits per move for self-play. | 100 | SELF_PLAY_VISITS | |
--model |
Path to a model file. | model/rl-model.bin | model/rl-model.bin |
Command line options for train.py
Option | Description | Example of value | Default value | Note |
---|---|---|---|---|
--kifu-dir |
Path to the directory contains SGF files. | /home/user/sgf_files | None | |
--size |
Go board size. | 5 | 9 | |
--use-gpu |
Flag to use a GPU. | true | true | Value is true or false. |
--rl |
Flag to execute reinforement learning | false | false | |
--window-size |
Window size for reinforcement learning | 500000 | 300000 |