About the Design of the Training Steps in the AlphaZero Algorithm #1270

Quicksilver0218 · 2024-08-20T20:01:41Z

Quicksilver0218
Aug 20, 2024

As we can see in open_spiel\python\algorithms\alpha_zero\alpha_zero.py line 402, the algorithm is implemented as below.

for _ in range(len(replay_buffer) // config.train_batch_size):
  data = replay_buffer.sample(config.train_batch_size)
  losses.append(model.update(data))

In each training step, len(data) rows of the data are extracted from the dataset for training. However, they are extracted in batches randomly with the probabilities in uniform distribution. i.e. Although the total numbers are the same, some rows may never be selected and others could be selected more than once.

Why don't we just iterate over all the data? e.g.

for i in range(math.ceil(len(replay_buffer) / config.train_batch_size)):
  data = replay_buffer.data[i * config.train_batch_size : (i + 1) * config.train_batch_size]
  losses.append(model.update(data))

Compared to this, the current implementation produces uncertainties and additional computational burdens (in random.sample(data, count)).

I see one of the advantages is that the batch size can remain unchanged in each iteration so that we can utilise as much of the GPU as possible. However, we can also implement that in this way.

# Outside def learn(step
iterations = len(replay_buffer) // config.train_batch_size
length = iterations * config.train_batch_size
# Can be passed as parameters

...

for i in range(iterations):
  start = ((step - 1) * length + i * config.train_batch_size) % len(replay_buffer.data) # step starts from 1
  end = (start + config.train_batch_size) % len(replay_buffer.data)
  if end > start:
    data = replay_buffer.data[start:end]
  else:
    data = replay_buffer.data[start:] + replay_buffer.data[:end]
  losses.append(model.update(data))

So, are there other important reasons for using the current design?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the Design of the Training Steps in the AlphaZero Algorithm #1270

{{title}}

Replies: 0 comments

Select a reply

About the Design of the Training Steps in the AlphaZero Algorithm #1270

Quicksilver0218 Aug 20, 2024

Replies: 0 comments

Quicksilver0218
Aug 20, 2024