GitHub - SamGijsen/AlphaZero_Homebrew: Homebrew implementation of AlphaZero using NumPy/PyTorch

AlphaZero Homebrew

Homebrew implementation of AlphaZero using NumPy/PyTorch for TicTacToe. I've written a bit more about the implementation here.

The algorithm leverages the following pieces:

mcts.py: Monte Carlo tree search, used to plan ahead by simulating game roll-outs.
neural_nets.py: Includes the PyTorch models. (Note: The original publication uses only a single network.)
- Policy net: Outputs a probability vector over possible moves given a game state. Used by MCTS to provide a prior distribution.
- Value/Target net: Outputs a Q-value for a given game state. Used by MCTS to evaluate a state of a game that is not (yet) completed.
self_play.py: Uses MCTS to run games and logs all activity for evaluation and training.
TTT_env.py: TicTacToe environment.

The resulting Self_Play and Training classes allow for a recursive process: first, a batch of games is played, which afterwards serve as training data for the DNNs. If all goes well, the trained DNNs allow for better play on the next batch of games, and so on. This play-training loop is implemented in the notebook. Here is a preview:

for v in range(iterations):
    
    # start with self-play
    print("Self-Play: Iter {} out of {}".format(v+1, iterations))

    engine = Self_Play(games=games, depth=depth, temperature=temperature, parameter_path = parameter_path)
    state_log, mcts_log, win_log = engine.play(version=v)
    
    
    # train DNN's using the played games
    print("Train: Policy & Value: Iter Net {} out of {}".format(v+1, iterations))
    
    if v == 0:
        train = Training()   
        
    pnet, losses = train.train_policy( 
    state_log, mcts_log, win_log, version=v, parameter_path = parameter_path, lr=lr_p, batchsize=batchsize_p, epochs=epochs_p
    )
    vnet, losses = train.train_value( 
    state_log, mcts_log, win_log, version=v, parameter_path = parameter_path, lr=lr_v, batchsize=batchsize_v, epochs=epochs_v
    )

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
dnn_checkpoints		dnn_checkpoints
dnn_checkpoints2		dnn_checkpoints2
images		images
README.md		README.md
TTT_env.py		TTT_env.py
TicTacToe_AlphaZero.ipynb		TicTacToe_AlphaZero.ipynb
mcts.py		mcts.py
neural_nets.py		neural_nets.py
self_play.py		self_play.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaZero Homebrew

About

Releases

Packages

Languages

SamGijsen/AlphaZero_Homebrew

Folders and files

Latest commit

History

Repository files navigation

AlphaZero Homebrew

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages