Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in VCD Reproduced Results for POPE Experiment #24

Open
jiyunBae007 opened this issue Nov 11, 2024 · 9 comments
Open

Discrepancy in VCD Reproduced Results for POPE Experiment #24

jiyunBae007 opened this issue Nov 11, 2024 · 9 comments

Comments

@jiyunBae007
Copy link

Hello,

Thank you for sharing the code and work.
I attempted to reproduce the POPE experiment results on the MSCOCO dataset. While the reproduced values for the Regular decoding setting are relatively close to the reported values, there is a significant discrepancy in the VCD decoding results compared to the values presented in the paper.

Following the paper's experimental setup, I used the five seeds mentioned (42, 1234, 2024, 777, 9999) and obtained the results shown in the table below. For all other hyperparameters and environment settings, I used the code exactly as provided in the GitHub repository.
image

Please let me know if there are any additional steps or specific settings not mentioned in the paper that could explain this discrepancy, particularly for the VCD decoding setup.

Thank you.

@maming109
Copy link

I'm getting similar results to yours

@LengSicong
Copy link
Collaborator

Hi, please refer to the hyper-parameter settings mentioned in the main paper (especially for the decoding settings).

@alex1243423
Copy link

Hello, where did you see these five seeds?

@jiyunBae007
Copy link
Author

Hi, please refer to the hyper-parameter settings mentioned in the main paper (especially for the decoding settings).

Hello. Thank you for your reply.
I have carefully followed the settings mentioned in both the paper and the provided code but still could not reproduce the results.

Specifically, I ran the code experiments/eval/object_hallucination_vqa_llava.py using the following command:

python object_hallucination_vqa_llava.py --use_cd --temperature 1.0 --top_p None --top_k None --noise_step 500 --cd_alpha 1 --cd_beta 0.1

These settings are consistent with the default values stated in the paper and the code.
Could you kindly help me identify where I might be making a mistake or if there are any additional considerations that might not be explicitly mentioned in the paper?
Thank you in advance for your support!

@jiyunBae007
Copy link
Author

Hello, where did you see these five seeds?

Hello. I used five seeds randomly.

@alex1243423
Copy link

For pope, the step is 999.

@jiyunBae007
Copy link
Author

jiyunBae007 commented Dec 2, 2024

For pope, the step is 999.

Thank you for pointing out the part I missed regarding the noise step being set to 999.
However, even after updating the noise step to 999,
still unable to reproduce the results as expected, as shown in the table below:

image

If you have any additional suggestions, I’d appreciate it.

@6thChoice
Copy link

I found that if we set temperature = 0.1 and cd_beta = 0.0001, which is not same as paper setting, the result is pretty close to paper. I run this in 3090, by the way, the author of the paper metioned the influence from different hardware. Could you share about your hardware information?

@jiyunBae007
Copy link
Author

Happy New Year! :)
I used an A6000 (48GB) for my experiments.
On a positive note, I was able to reproduce the results presented in the paper by setting the numpy version to 1.23.0.
Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants