Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance reproducing #26

Open
jsikyoon opened this issue Nov 4, 2024 · 6 comments
Open

The performance reproducing #26

jsikyoon opened this issue Nov 4, 2024 · 6 comments

Comments

@jsikyoon
Copy link

jsikyoon commented Nov 4, 2024

Hi @buoyancy99 ,

Thank you for sharing the source code of your project.

I tried to reproduce the performance on Maze2D medium and large in your paper with the configuration suggested in your paper, but I couldn't get the performance as reported in the paper.

Could you check what I missed?

  • Using paper branch
  • Using 2048 batch size, diffusion network size 16 and 32 for medium and large.
  • For large environment, I used 50K iterations.

When using the above configurations with your commend shared in README.md file, I got about 117 episode reward for Maze2D-Large.

Bests, Jaesik.

@buoyancy99
Copy link
Owner

Hi Jaesik,

I haven't touched the v1 code in a while, will take a look when I get more time. In the meanwhile, could you try the v1.5 code on main branch with transformer?

@jsikyoon
Copy link
Author

jsikyoon commented Nov 5, 2024 via email

@hyeonscho
Copy link

I believe increasing experiment.validation.limit_batch could help achieve more accurate results. In my case, I was able to reproduce the performance when I set it to 10.

@jsikyoon
Copy link
Author

jsikyoon commented Nov 6, 2024

Hi @buoyancy99 ,

while investigating this issue, I found the setting suggested in README is only evaluating very few samples in the validation set. Thus, I tried to evaluate every sample in the set, but it requires very long time due to the roll-out based planning. So, I have two questions,

  • The numbers in the paper are from the validation with the whole set, right?
  • If yes, then how did you evaluate that? Just evaluating few samples in the development phase, and evaluating the whole set before submitting?

Thank you in advance.
Bests, Jaesik.

@buoyancy99
Copy link
Owner

We do evaluate a good amount of data points and yes, it's very slow in the v1 RNN code, which is why I rewrote it into the v1.5 code on main branch.

@jsikyoon
Copy link
Author

jsikyoon commented Nov 6, 2024

Thank you for your quick reply. Currently my concern is resolved enough.

If you want to close this issue, I am okay to close. May another issue will be raised while investigating your model, I will reopen this.

  • how many samples did you evaluate? Do you remember this? @buoyancy99

Again, thank you for your contribution on planning domain and sharing your source codes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants