ReplayBuffer size with on-policy algorithm #1161

jvasso · 2024-06-13T00:11:34Z

jvasso
Jun 13, 2024

Hi,
Can someone confirm that when using an on-policy algorithm like PPO, the size of the ReplayBuffer should be equal to the value of step_per_collect? Unless I'm mistaken, on-policy algorithms require the data used for training to be collected from the same policy that is currently being learned. Therefore, if the ReplayBuffer size exceeds step_per_collect, the next policy update might be based on data collected with old versions of the policy, which could distort the gradient.

Answered by MischaPanch

Sep 2, 2024

@jvasso essentially yes, it should not be smaller. If it's larger that's not a problem, since the buffer will be reset before the next learning step.

This structure (of having to deal with the buffer size at all for an on-policy algo) is one of Tianshou's main technical debts, and we're planning to address this with a major refactoring that will be part of Tianshou 2.0.0

View full answer

MischaPanch · 2024-09-02T17:35:43Z

MischaPanch
Sep 2, 2024
Maintainer

@jvasso essentially yes, it should not be smaller. If it's larger that's not a problem, since the buffer will be reset before the next learning step.

This structure (of having to deal with the buffer size at all for an on-policy algo) is one of Tianshou's main technical debts, and we're planning to address this with a major refactoring that will be part of Tianshou 2.0.0

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReplayBuffer size with on-policy algorithm #1161

{{title}}

Replies: 1 comment

{{title}}

Select a reply

ReplayBuffer size with on-policy algorithm #1161

jvasso Jun 13, 2024

Replies: 1 comment

MischaPanch Sep 2, 2024 Maintainer

jvasso
Jun 13, 2024

MischaPanch
Sep 2, 2024
Maintainer