Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the calculation of precision and recall of MLLMs #3

Open
backseason opened this issue Dec 3, 2024 · 10 comments
Open

About the calculation of precision and recall of MLLMs #3

backseason opened this issue Dec 3, 2024 · 10 comments

Comments

@backseason
Copy link

Thanks for open-sourcing the code. Will you share the implementation of how to calculate the precision, recall and mAP scores of the MLLMs in Table.2 of your paper?

@Mountchicken
Copy link
Collaborator

Hi @backseason
Thanks for your interest. We'll give an example code to evalaute the detection metrics ASAP!

@backseason
Copy link
Author

Thanks for the quick feedback. Since the code might take days, can you share more details about the evaluation setting firstly?

  1. Are the precision and recall in Tab.2 reported at a specific confidence threshold?
  2. Are precision and recall micro-averaged (calculated across all predictions without class-wise breakdown) or class-wise averaged?
  3. Is mAP computed across the IoU range of [0.5:0.95]?

@Mountchicken
Copy link
Collaborator

Sure. The precision and recall are calculated at an IoU threshold of 0.5. They are then averaged across all classes after computing them for each individual class. As for mAP, it is computed over the IoU range of [0.5:0.95], which aligns the calculation with other detection methods that report mAP in this manner.

@backseason
Copy link
Author

Which confidence threshold did you use to calculate the precision and recall?

@Mountchicken
Copy link
Collaborator

For ChatRex, we use a threshold score of 0.3. For other MLLMs, they don't predict the confidence of the box, so there is no threshold

@backseason
Copy link
Author

I do notice that in Tab.6 of your paper you wrote that "[email protected] and [email protected] represents recall and precision at score threshold at 0.3.". So you also use a score threshold of 0.3 in Tab.2 (score threshold of 0.3 and IoU threshold of 0.5). Is that correct?

@Mountchicken
Copy link
Collaborator

In Table 2, the IoU threshold is 0.5 when calculating the recall and precision metrics. When calculating the mAP, the IoU is [0.5, 0.95]

@backseason
Copy link
Author

I mean the confidence (score) threshold used when calculating the precision and recall metrics in Tab.2, not the IoU threshold which is 0.5.

@Mountchicken
Copy link
Collaborator

For ChatRex, confidence is 0.3 and this confidence is only used to filter the output from UPN. For other MLLMs, there is no confidence

@backseason
Copy link
Author

I get it! Thank you for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants