About the Design of the Training Steps in the AlphaZero Algorithm #1270
Closed
Quicksilver0218
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As we can see in open_spiel\python\algorithms\alpha_zero\alpha_zero.py line 402, the algorithm is implemented as below.
In each training step,
len(data)
rows of the data are extracted from the dataset for training. However, they are extracted in batches randomly with the probabilities in uniform distribution. i.e. Although the total numbers are the same, some rows may never be selected and others could be selected more than once.Why don't we just iterate over all the data? e.g.
Compared to this, the current implementation produces uncertainties and additional computational burdens (in
random.sample(data, count)
).I see one of the advantages is that the batch size can remain unchanged in each iteration so that we can utilise as much of the GPU as possible. However, we can also implement that in this way.
So, are there other important reasons for using the current design?
Beta Was this translation helpful? Give feedback.
All reactions