-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreting the training curves on Tic Tac Toe #152
Comments
I ran the default like you did and got the same results as you. Good graphs but Mu performs poorly in actual games. For my second experiment, I change it from "expert" to "self". And these are my results. The Loss is very good so it makes me believe the Ai learned. Both MuZero and opponent are heading to 0 reward (I believe this is expected right? More draws) but it still performs in reality poorly. I was also thinking the same as you. I have made a code update, Win = 1, Draw = 0 and Loss = -1. This is what DeepMind reported in their paper. I will let you know how that experiment goes. |
@theword OK, looking forward to hear from you. |
Hi, I also tried to train it "as is" and was not able to get a good performance. As said, even though the plots show some progress, it plays quite bad. I opened an issue with the same problem: Even after training for almost 10 hours I still do not reach the performance on the attached files, which I think is very good. Hope it helps. As anyone else had success with the training? |
Was able to get really impressive graphs but it would still lose when I played against it. Expected it to draw every time. |
Same here. I was not able to train it to play tic-tac-toe. Shifting to another AI algorithm since I'm unable to train anything here. |
@JohnPPP / @theword , I've modified the tic-tac-toe a bit and got some really good results. It looks like the learning rate wasn't high enough and it should be using SGD instead of Adam. I also had to make sure there was as many workers as I had CPU threads to speed up the process. I then started tuning a learning rate decay over time. I tried it the first time, and embarrassingly, it beat me. 🤣 I've tested it and it's pretty good. I've attached a screenshot of my tensorboard showing MuZero learning strongly and opponent goes to 0. |
@AdrianAcala , That is great! congratz on your improvements! I've changed from this Muzero implementation to an Alpha-zero one. I must say that this one is much better designed, implemented, and easier to create new environments. However, I still cannot tune anything... It appears that the fault is on me alone and that's good. Whishing all that come across this code great successes in the AI world, |
@AdrianAcala Could you share your config of the hparams? |
@theword , I'm running into issues replicating it. The learning rate is way too high and blows up to infinity sometimes. Though, when I lower the learning rate, then it takes forever to learn. |
@theword , AHA! I was able to increase the learning rate and not have it blow up by also increasing the batch size. I've tried this twice now and both runs show great results. Will have a PR later this evening. |
Hello, |
First of all, thank you very much for opening the code.
I have trained a model with your default configuration. But the effect is not good, what may be the reason?
As far as I can tell, an agent who has mastered tic-tac-toe must have a draw on its own game, but it has decreased in episode length.
The text was updated successfully, but these errors were encountered: