Evaluation code for benchmark #16

iisxuwei · 2023-12-22T05:30:53Z

Hi, I'm very interested in your work and would like to know if the evaluation code in the benchmark will be released? Additionally, is selecting only 100 images for the evaluation too few and potentially unfair, but it seems there's no alternative due to the API limitations of GPT-4V.

Looking forward to your reply.

jwyang · 2023-12-23T21:31:04Z

@iisxuwei , thanks for your interests!

We will release the evaluation code for the benchmark very soon during this holiday season. Stay tunned!

It is indeed unfair to compare with other methods that were evaluated on the full validation set, but unfortunately, we could not have enough quota to call GPT-4V. We did evaluate some methods on our samples and noted in our table.

We are looking into how to setup a better evaluation pipeline for all methods.

thanks,

iisxuwei · 2024-01-10T15:00:34Z

Hi，i'm wondering the some Metrics in the evaluation benchmark, like mIou and [email protected]. In the benchmark page, the prompt for REC and RES are same. And GPT's return is the mark number or a range of mark numbers. How do you compute the metrics according to the GPT's return?
By the way, i'm curious if GPT's return is not one number but some numbers, how do you judge it? Or should multiple rounds of dialogue be used to correct the output of the results?
Looking forward to your reply.

iisxuwei · 2024-03-01T12:22:30Z

Hi！
I've been following this project and am very interested in its progress. Could you please provide any recent updates?

P.S. I am very confused about the results in RefCOCOg in the experimental results section (Table 2).

The published benchmark data does not match the number of instances（177） shown in the paper.
The RefCOCOg benchmark lacks the mask area data with the corresponding label, making it impossible to reproduce the results of REC and RES.

I hope you can reply to me as soon as possible. I am very interested in your work and would like to cite it.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation code for benchmark #16

Evaluation code for benchmark #16

iisxuwei commented Dec 22, 2023

jwyang commented Dec 23, 2023

iisxuwei commented Jan 10, 2024

iisxuwei commented Mar 1, 2024

Evaluation code for benchmark #16

Evaluation code for benchmark #16

Comments

iisxuwei commented Dec 22, 2023

jwyang commented Dec 23, 2023

iisxuwei commented Jan 10, 2024

iisxuwei commented Mar 1, 2024