Interpreting the training curves on Tic Tac Toe #152

itmorn · 2021-04-23T09:51:35Z

First of all, thank you very much for opening the code.
I have trained a model with your default configuration. But the effect is not good, what may be the reason?
As far as I can tell, an agent who has mastered tic-tac-toe must have a draw on its own game, but it has decreased in episode length.

werner-duvaud · 2021-04-23T10:47:59Z

Hi,

What do you mean by "the effect is not good" ?

What I see is that:

On the MuZero reward plot:

MuZero used to draw or lose at the start, now he wins once in 4.
The curve is quite increasing and he seems to be learning and progressing.
On the opponnent reward:

It is clearly decreasing, he only wins once in 3 now.
In addition, it seems to me that you are testing against the expert opponent, so it is hand-coded to systematically block the winning opportunities.

In conclusion, with a longer training MuZero should continue to progress and the opponent will draw more and more.

itmorn · 2021-04-27T07:48:05Z

Thank you for your suggestion.
I trained the model for a longer time. The following figure shows the training curve:

I found out the agent is not very smart. The result of a game between me and it is as follows:

log.txt

The unsmart action is as follows:

I basically used the default configuration(self.num_workers = 7, self.selfplay_on_gpu = True, self.reanalyse_on_gpu = True). I found in the source code that the reward for both a draw and a loss is 0, so could this be the cause?

theword · 2021-05-12T21:09:36Z

I ran the default like you did and got the same results as you. Good graphs but Mu performs poorly in actual games. For my second experiment, I change it from "expert" to "self". And these are my results. The Loss is very good so it makes me believe the Ai learned. Both MuZero and opponent are heading to 0 reward (I believe this is expected right? More draws) but it still performs in reality poorly.

I was also thinking the same as you. I have made a code update, Win = 1, Draw = 0 and Loss = -1. This is what DeepMind reported in their paper. I will let you know how that experiment goes.

itmorn · 2021-05-13T01:28:14Z

@theword OK, looking forward to hear from you.

JohnPPP · 2021-05-13T08:05:35Z

Hi,

I also tried to train it "as is" and was not able to get a good performance. As said, even though the plots show some progress, it plays quite bad.

I opened an issue with the same problem:

#134

Even after training for almost 10 hours I still do not reach the performance on the attached files, which I think is very good.

Hope it helps.

As anyone else had success with the training?

theword · 2021-07-14T22:15:55Z

@theword OK, looking forward to hear from you.

Was able to get really impressive graphs but it would still lose when I played against it. Expected it to draw every time.

JohnPPP · 2021-07-23T08:05:34Z

@theword OK, looking forward to hear from you.

Was able to get really impressive graphs but it would still lose when I played against it. Expected it to draw every time.

Same here. I was not able to train it to play tic-tac-toe. Shifting to another AI algorithm since I'm unable to train anything here.

AdrianAcala · 2021-07-27T07:20:18Z

@JohnPPP / @theword , I've modified the tic-tac-toe a bit and got some really good results. It looks like the learning rate wasn't high enough and it should be using SGD instead of Adam. I also had to make sure there was as many workers as I had CPU threads to speed up the process. I then started tuning a learning rate decay over time. I tried it the first time, and embarrassingly, it beat me. 🤣 I've tested it and it's pretty good. I've attached a screenshot of my tensorboard showing MuZero learning strongly and opponent goes to 0.

JohnPPP · 2021-07-27T09:14:33Z

@AdrianAcala , That is great! congratz on your improvements!

I've changed from this Muzero implementation to an Alpha-zero one. I must say that this one is much better designed, implemented, and easier to create new environments. However, I still cannot tune anything... It appears that the fault is on me alone and that's good.

Whishing all that come across this code great successes in the AI world,
João

theword · 2021-07-28T22:15:48Z

@AdrianAcala Could you share your config of the hparams?

AdrianAcala · 2021-08-03T18:42:09Z

@theword , I'm running into issues replicating it. The learning rate is way too high and blows up to infinity sometimes. Though, when I lower the learning rate, then it takes forever to learn.

AdrianAcala · 2021-08-04T20:41:20Z

@theword , AHA! I was able to increase the learning rate and not have it blow up by also increasing the batch size. I've tried this twice now and both runs show great results. Will have a PR later this evening.

ahainaut · 2022-03-05T13:18:13Z

Hello,
Thanks for your experiments and feedbacks on hyper parameters !
Closing this issue since @AdrianAcala submitted a new set of HP (see #169).

werner-duvaud changed the title ~~A training problem on TicTacToe~~ Interpreting the training curves on Tic Tac Toe Apr 23, 2021

werner-duvaud added the documentation Improvements or additions to documentation label Apr 25, 2021

ahainaut closed this as completed Mar 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreting the training curves on Tic Tac Toe #152

Interpreting the training curves on Tic Tac Toe #152

itmorn commented Apr 23, 2021

werner-duvaud commented Apr 23, 2021 •

edited

Loading

itmorn commented Apr 27, 2021 •

edited

Loading

theword commented May 12, 2021

itmorn commented May 13, 2021

JohnPPP commented May 13, 2021

theword commented Jul 14, 2021

JohnPPP commented Jul 23, 2021

AdrianAcala commented Jul 27, 2021

JohnPPP commented Jul 27, 2021

theword commented Jul 28, 2021

AdrianAcala commented Aug 3, 2021

AdrianAcala commented Aug 4, 2021

ahainaut commented Mar 5, 2022

Interpreting the training curves on Tic Tac Toe #152

Interpreting the training curves on Tic Tac Toe #152

Comments

itmorn commented Apr 23, 2021

werner-duvaud commented Apr 23, 2021 • edited Loading

itmorn commented Apr 27, 2021 • edited Loading

theword commented May 12, 2021

itmorn commented May 13, 2021

JohnPPP commented May 13, 2021

theword commented Jul 14, 2021

JohnPPP commented Jul 23, 2021

AdrianAcala commented Jul 27, 2021

JohnPPP commented Jul 27, 2021

theword commented Jul 28, 2021

AdrianAcala commented Aug 3, 2021

AdrianAcala commented Aug 4, 2021

ahainaut commented Mar 5, 2022

werner-duvaud commented Apr 23, 2021 •

edited

Loading

itmorn commented Apr 27, 2021 •

edited

Loading