add the general model for best of three game #78
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
林哥你好,
小老弟是你的老粉丝了,这次用了你的项目做了个小组作业,感谢你的分享!
在你的基础上我做了一些尝试,试图训练一个可以稳定打赢整场的AI。我先试图用随机的第一和第二局来训练,但是结果不是很理想。可能是由于敌人起始变化太大,最后模型(10m-steps)结果达到胜率58%。
随后尝试了使用整个三局两胜进行训练,重构了steps中的done条件。加入了self.jump和self.round_end,用于跳过过场和记录round是否结束。在经过了大致5m steps后reward基本收敛。测试以后达到了98%的胜率。非常令人激动!!
以下是我的tensorboard训练结果蓝线是random训练结果,紫线是entire match训练结果
我pull了general(三局两胜)的代码与结果,random的方法我再尝试通过调试reward function 获得更快的学习速率暂时就不上传了,希望能够pull我的结果给大家一起分享。
祝一切安好!
Jing WANG