About the calculation of precision and recall of MLLMs #3

backseason · 2024-12-03T11:06:40Z

Thanks for open-sourcing the code. Will you share the implementation of how to calculate the precision, recall and mAP scores of the MLLMs in Table.2 of your paper?

Mountchicken · 2024-12-04T07:51:11Z

Hi @backseason
Thanks for your interest. We'll give an example code to evalaute the detection metrics ASAP!

backseason · 2024-12-05T08:38:49Z

Thanks for the quick feedback. Since the code might take days, can you share more details about the evaluation setting firstly?

Are the precision and recall in Tab.2 reported at a specific confidence threshold?
Are precision and recall micro-averaged (calculated across all predictions without class-wise breakdown) or class-wise averaged?
Is mAP computed across the IoU range of [0.5:0.95]?

Mountchicken · 2024-12-05T08:58:29Z

Sure. The precision and recall are calculated at an IoU threshold of 0.5. They are then averaged across all classes after computing them for each individual class. As for mAP, it is computed over the IoU range of [0.5:0.95], which aligns the calculation with other detection methods that report mAP in this manner.

backseason · 2024-12-05T09:01:15Z

Which confidence threshold did you use to calculate the precision and recall?

Mountchicken · 2024-12-05T09:07:48Z

For ChatRex, we use a threshold score of 0.3. For other MLLMs, they don't predict the confidence of the box, so there is no threshold

backseason · 2024-12-05T09:12:51Z

I do notice that in Tab.6 of your paper you wrote that "[email protected] and [email protected] represents recall and precision at score threshold at 0.3.". So you also use a score threshold of 0.3 in Tab.2 (score threshold of 0.3 and IoU threshold of 0.5). Is that correct?

Mountchicken · 2024-12-05T09:17:27Z

In Table 2, the IoU threshold is 0.5 when calculating the recall and precision metrics. When calculating the mAP, the IoU is [0.5, 0.95]

backseason · 2024-12-05T09:21:31Z

I mean the confidence (score) threshold used when calculating the precision and recall metrics in Tab.2, not the IoU threshold which is 0.5.

Mountchicken · 2024-12-05T09:23:23Z

For ChatRex, confidence is 0.3 and this confidence is only used to filter the output from UPN. For other MLLMs, there is no confidence

backseason · 2024-12-05T09:27:17Z

I get it! Thank you for your patience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the calculation of precision and recall of MLLMs #3

About the calculation of precision and recall of MLLMs #3

backseason commented Dec 3, 2024

Mountchicken commented Dec 4, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

About the calculation of precision and recall of MLLMs #3

About the calculation of precision and recall of MLLMs #3

Comments

backseason commented Dec 3, 2024

Mountchicken commented Dec 4, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024

Mountchicken commented Dec 5, 2024

backseason commented Dec 5, 2024