Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting the training curves on Tic Tac Toe #152

Closed
itmorn opened this issue Apr 23, 2021 · 13 comments
Closed

Interpreting the training curves on Tic Tac Toe #152

itmorn opened this issue Apr 23, 2021 · 13 comments
Labels
documentation Improvements or additions to documentation

Comments

@itmorn
Copy link

itmorn commented Apr 23, 2021

微信图片_20210423174341
First of all, thank you very much for opening the code.
I have trained a model with your default configuration. But the effect is not good, what may be the reason?
As far as I can tell, an agent who has mastered tic-tac-toe must have a draw on its own game, but it has decreased in episode length.

@werner-duvaud
Copy link
Owner

werner-duvaud commented Apr 23, 2021

Hi,

What do you mean by "the effect is not good" ?

What I see is that:

  • On the MuZero reward plot:
    reward
    MuZero used to draw or lose at the start, now he wins once in 4.
    The curve is quite increasing and he seems to be learning and progressing.

  • On the opponnent reward:
    opponent
    It is clearly decreasing, he only wins once in 3 now.
    In addition, it seems to me that you are testing against the expert opponent, so it is hand-coded to systematically block the winning opportunities.

In conclusion, with a longer training MuZero should continue to progress and the opponent will draw more and more.

@werner-duvaud werner-duvaud changed the title A training problem on TicTacToe Interpreting the training curves on Tic Tac Toe Apr 23, 2021
@werner-duvaud werner-duvaud added the documentation Improvements or additions to documentation label Apr 25, 2021
@itmorn
Copy link
Author

itmorn commented Apr 27, 2021

Thank you for your suggestion.
I trained the model for a longer time. The following figure shows the training curve:
image

I found out the agent is not very smart. The result of a game between me and it is as follows:

log.txt

The unsmart action is as follows:
image

I basically used the default configuration(self.num_workers = 7, self.selfplay_on_gpu = True, self.reanalyse_on_gpu = True). I found in the source code that the reward for both a draw and a loss is 0, so could this be the cause?

@theword
Copy link
Contributor

theword commented May 12, 2021

I ran the default like you did and got the same results as you. Good graphs but Mu performs poorly in actual games. For my second experiment, I change it from "expert" to "self". And these are my results. The Loss is very good so it makes me believe the Ai learned. Both MuZero and opponent are heading to 0 reward (I believe this is expected right? More draws) but it still performs in reality poorly.

I was also thinking the same as you. I have made a code update, Win = 1, Draw = 0 and Loss = -1. This is what DeepMind reported in their paper. I will let you know how that experiment goes.

image

image

@itmorn
Copy link
Author

itmorn commented May 13, 2021

@theword OK, looking forward to hear from you.

@JohnPPP
Copy link

JohnPPP commented May 13, 2021

Hi,

I also tried to train it "as is" and was not able to get a good performance. As said, even though the plots show some progress, it plays quite bad.

I opened an issue with the same problem:

#134

Even after training for almost 10 hours I still do not reach the performance on the attached files, which I think is very good.

Hope it helps.

As anyone else had success with the training?

@theword
Copy link
Contributor

theword commented Jul 14, 2021

@theword OK, looking forward to hear from you.

Was able to get really impressive graphs but it would still lose when I played against it. Expected it to draw every time.

@JohnPPP
Copy link

JohnPPP commented Jul 23, 2021

@theword OK, looking forward to hear from you.

Was able to get really impressive graphs but it would still lose when I played against it. Expected it to draw every time.

Same here. I was not able to train it to play tic-tac-toe. Shifting to another AI algorithm since I'm unable to train anything here.

@AdrianAcala
Copy link

@JohnPPP / @theword , I've modified the tic-tac-toe a bit and got some really good results. It looks like the learning rate wasn't high enough and it should be using SGD instead of Adam. I also had to make sure there was as many workers as I had CPU threads to speed up the process. I then started tuning a learning rate decay over time. I tried it the first time, and embarrassingly, it beat me. 🤣 I've tested it and it's pretty good. I've attached a screenshot of my tensorboard showing MuZero learning strongly and opponent goes to 0.

image

@JohnPPP
Copy link

JohnPPP commented Jul 27, 2021

@AdrianAcala , That is great! congratz on your improvements!

I've changed from this Muzero implementation to an Alpha-zero one. I must say that this one is much better designed, implemented, and easier to create new environments. However, I still cannot tune anything... It appears that the fault is on me alone and that's good.

Whishing all that come across this code great successes in the AI world,
João

@theword
Copy link
Contributor

theword commented Jul 28, 2021

@AdrianAcala Could you share your config of the hparams?

@AdrianAcala
Copy link

@theword , I'm running into issues replicating it. The learning rate is way too high and blows up to infinity sometimes. Though, when I lower the learning rate, then it takes forever to learn.

@AdrianAcala
Copy link

@theword , AHA! I was able to increase the learning rate and not have it blow up by also increasing the batch size. I've tried this twice now and both runs show great results. Will have a PR later this evening.

@ahainaut
Copy link
Collaborator

ahainaut commented Mar 5, 2022

Hello,
Thanks for your experiments and feedbacks on hyper parameters !
Closing this issue since @AdrianAcala submitted a new set of HP (see #169).

@ahainaut ahainaut closed this as completed Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

6 participants