From a8c0e27e36e0c3d87904bd6ebc6cc57e9dc469a7 Mon Sep 17 00:00:00 2001 From: Chris Hitzel Date: Fri, 31 May 2024 13:45:58 -0400 Subject: [PATCH] alphacombgame --- _posts/2024-05-16-alphazero-on-some-combinatorial-games.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/_posts/2024-05-16-alphazero-on-some-combinatorial-games.md b/_posts/2024-05-16-alphazero-on-some-combinatorial-games.md index f7743f9..5b170f8 100644 --- a/_posts/2024-05-16-alphazero-on-some-combinatorial-games.md +++ b/_posts/2024-05-16-alphazero-on-some-combinatorial-games.md @@ -2,7 +2,7 @@ layout: note title: "Trying AlphaZero on Some Combinatorial Games" date: 2024-05-16 00:00:00 -0500 -modified_date: 2024-05-30 19:30:00 -0500 +modified_date: 2024-05-31 13:30:00 -0500 categories: math,rl --- @@ -10,7 +10,9 @@ One lazy thing to do when you want to try and learn a few things is to smash the ~~Current status: implemented a bad version of AlphaZero in Julia using the monkey-see-monkey-do method (monkey saw a tutorial+repo and a different Julia implementation). Working through some BatchNorm instability and figuring out if monkey did what monkey saw correctly. Will need to compare against the referenced existing implementations and troubleshoot. May wind up just using one of those two once I feel that I "get it."~~ -Current status (5/30ish): Coming back to this after being busy w/ some work. Still figuring out what I am doing wrong w.r.t. BatchNorm. I'm thinking I did something stupid and obvious (only in retrospect of course). Going to use a canned library after some more wrestling as I am pretty eager to poke at the policies learned for some combinatorial games. I can always circle back to implementation. +~~Current status (5/30ish): Coming back to this after being busy w/ some work. Still figuring out what I am doing wrong w.r.t. BatchNorm. I'm thinking I did something stupid and obvious (only in retrospect of course). Going to use a canned library after some more wrestling as I am pretty eager to poke at the policies learned for some combinatorial games. I can always circle back to implementation.~~ + +5/31: Some lunchtime fiddling. Left this thing to run (on CPU...) last night while out. Seems to always be selecting k = 1 in the (modified) Euclid game I have set up. I might have implemented something wrong, but I have read that people have issues centered around finding the correct \\(c_{puct}\\) constant value. Going to let it rip with \\(c_{puct}\\) = 5 and see if that improves anything. Otherwise going to compare (again) what I implemented to one of those other libraries to see if that points me toward an answer (incl. re: BatchNorm instability). RL/Alphazero ref dump: * [A Simple Alpha(Go) Zero Tutorial](https://suragnair.github.io/posts/alphazero.html)