The performance reproducing #26

jsikyoon · 2024-11-04T13:31:13Z

Thank you for sharing the source code of your project.

I tried to reproduce the performance on Maze2D medium and large in your paper with the configuration suggested in your paper, but I couldn't get the performance as reported in the paper.

Could you check what I missed?

Using paper branch
Using 2048 batch size, diffusion network size 16 and 32 for medium and large.
For large environment, I used 50K iterations.

When using the above configurations with your commend shared in README.md file, I got about 117 episode reward for Maze2D-Large.

Bests, Jaesik.

The text was updated successfully, but these errors were encountered:

buoyancy99 · 2024-11-04T20:29:44Z

Hi Jaesik,

I haven't touched the v1 code in a while, will take a look when I get more time. In the meanwhile, could you try the v1.5 code on main branch with transformer?

jsikyoon · 2024-11-05T11:39:45Z

Okay I will do. Thank you for your quick reply!

…

________________________________ 보낸 사람: Boyuan Chen ***@***.***> 보낸 날짜: Tuesday, November 5, 2024 5:30:07 AM 받는 사람: buoyancy99/diffusion-forcing ***@***.***> 참조: Jaesik Yoon ***@***.***>; Author ***@***.***> 제목: Re: [buoyancy99/diffusion-forcing] The performance reproducing (Issue #26) Hi Jaesik, I haven't touched the v1 code in a while, will take a look when I get more time. In the meanwhile, could you try the v1.5 code on main branch with transformer? — Reply to this email directly, view it on GitHub<#26 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFLDXI7ANDNVGNSTSZHXD4DZ67DL7AVCNFSM6AAAAABREIQNC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJVGY2DAMRUHE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

hyeonscho · 2024-11-05T11:50:41Z

I believe increasing experiment.validation.limit_batch could help achieve more accurate results. In my case, I was able to reproduce the performance when I set it to 10.

jsikyoon · 2024-11-06T09:33:13Z

Hi @buoyancy99 ,

while investigating this issue, I found the setting suggested in README is only evaluating very few samples in the validation set. Thus, I tried to evaluate every sample in the set, but it requires very long time due to the roll-out based planning. So, I have two questions,

The numbers in the paper are from the validation with the whole set, right?
If yes, then how did you evaluate that? Just evaluating few samples in the development phase, and evaluating the whole set before submitting?

Thank you in advance.
Bests, Jaesik.

buoyancy99 · 2024-11-06T18:58:00Z

We do evaluate a good amount of data points and yes, it's very slow in the v1 RNN code, which is why I rewrote it into the v1.5 code on main branch.

jsikyoon · 2024-11-06T23:59:08Z

Thank you for your quick reply. Currently my concern is resolved enough.

If you want to close this issue, I am okay to close. May another issue will be raised while investigating your model, I will reopen this.

how many samples did you evaluate? Do you remember this? @buoyancy99

Again, thank you for your contribution on planning domain and sharing your source codes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance reproducing #26

The performance reproducing #26

jsikyoon commented Nov 4, 2024

buoyancy99 commented Nov 4, 2024

jsikyoon commented Nov 5, 2024 via email

hyeonscho commented Nov 5, 2024

jsikyoon commented Nov 6, 2024

buoyancy99 commented Nov 6, 2024

jsikyoon commented Nov 6, 2024 •

edited

Loading

The performance reproducing #26

The performance reproducing #26

Comments

jsikyoon commented Nov 4, 2024

buoyancy99 commented Nov 4, 2024

jsikyoon commented Nov 5, 2024 via email

hyeonscho commented Nov 5, 2024

jsikyoon commented Nov 6, 2024

buoyancy99 commented Nov 6, 2024

jsikyoon commented Nov 6, 2024 • edited Loading

jsikyoon commented Nov 6, 2024 •

edited

Loading