Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mAP is too low but detect objects well with trained ckpt #27

Open
Janezzliu opened this issue Apr 15, 2018 · 10 comments
Open

mAP is too low but detect objects well with trained ckpt #27

Janezzliu opened this issue Apr 15, 2018 · 10 comments

Comments

@Janezzliu
Copy link

Hi @LevinJ ,I apply SSD_tensorflow_VOC to my own datasets.I first Train SSD specific weights with self.max_number_of_steps = 10000,then Train VGG16 ad SSD specific weights with self.max_number_of_steps = 900000.First step has finished and second step has reached 60000.My loss is around 1.8 ,training mAP is 0.18 and testing mAP is 0.17. However,when I use trained ckpt to detect objects in testing pictures,it does well! So I go to your codes and website https://sanchom.wordpress.com/tag/average-precision/ to learn how mAP is computed. I don't find anything wrong. I'm quite confused. The testing results with trained ckpt don't match the mAP with 0.17.

@Janezzliu
Copy link
Author

Janezzliu commented Apr 15, 2018

I have found the reason.My label is 4, I should construct corresponding dictionary variable which stores the AP of every label to compute mAP.

@LevinJ
Copy link
Owner

LevinJ commented Apr 16, 2018

Hi @Janezzliu ,glad to learn you've got the issue fixed. Nice debugging!

@ghost
Copy link

ghost commented Apr 19, 2018

@LevinJ , For all poor souls out there that still trying to figure out why there is such a big gap in evaluation:

  • From the previous repo, the issue was in the generation of tfrecords. The tfrecords have the difficults clamped to 0 , therefore the ground truths are wrong. All ground truths are labeled as non difficults where they shouldn't, because they are sorted out in the bboxes_matching method. The evaluation evaluates on difficults also eventually which gives this big gap. The original caffe implementation achieves 0.69 mAP on evaluation with difficult ground truths.
  • Note that , you should train on difficults though, they shouldn't be excluded from the training.

I am leaving this comment here since it's the most recent regarding mAP. Also haven't tested your code but I see that the script there is the same, so I am gonna assume that the error persists. I hope this helps.

@LevinJ
Copy link
Owner

LevinJ commented Apr 20, 2018

Hi @bnbhehe , thanks for sharing your findings!

Can you elaborate a bit on "The tfrecords have the difficults clamped to 0 , therefore the ground truths are wrong. All ground truths are labeled as non difficults where they shouldn't, because they are sorted out in the bboxes_matching method."?

Which lines of code clamped the difficulites attribute of training/evaluation samples to 0? I tried checking the codes, but was not able to find them.

@ghost
Copy link

ghost commented Apr 21, 2018

in pascal_to_tfrecords script there is this line
if obj.find('difficult'):
the thing is that the find function is NoneType therefore this check is always false. I decoded the tfrecords and there were no difficults. everything had a value of 0 therefore you always evaluate on them. For training though you need them thats why they are not called in the train script.

you can notice this data corruption if you try to evaluate with difficults by switching the remove_difficult flag to false. the map should be the same ( writing from mobile so sorry if my reply is vague)

@LevinJ
Copy link
Owner

LevinJ commented Apr 24, 2018

Hi @bnbhehe, not sure if it's an environment setup related issue, but it looks obj.find('difficult') can return a valid object in my desktop, as you can see below.

screenshot from 2018-04-24 07-58-13

I checked the annotation file, there is indeed a difficult field,

<object>
		<name>dog</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>

What are your thoughts on this?

@ghost
Copy link

ghost commented Apr 24, 2018

It should return but its NoneType and the check will fail. Only if you call .text that gives a value but if the label is not present it will crash. I used python 3.5 and didnt have the behavior i wanted with this xml parser therefore i rewrote the process_image in the tfrecords script with xmltodict.

I would suggest you check the remove_difficult flag in eval script. It would not affect the map if you turn it off. (spoiler alert: for me it didnt)

I would also make a small decoding script to see the number of difficult annotations present.

Please reply me on what is the best map you get if this is fixed.

@LevinJ
Copy link
Owner

LevinJ commented Apr 24, 2018

Hi @bnbhehe , I checked the codes a bit more closely and agree that you are correct, currently all bounding boxes are mistakenly labelled as non-difficult.

This is because obj.find('difficult') returns an Element object, the object has a len attribute, which equals to zero, as a result, if obj.find('difficult'): is always evaluated as False. For those who are interested, see here for more details. To fix the bug, one could simply replace the line as if obj.find('difficult') is not None:

As for evaluating the model on the non-difficult ground truth labels, I am currently quite tied up with other stuff, and might do it when i have some time :)

By the way, can you tell me where you find that "The original caffe implementation achieves 0.69 mAP on evaluation with difficult ground truths."? Thanks.

@ghost
Copy link

ghost commented Apr 24, 2018

@LevinJ, I trained it myself on Caffe. 2 times also. I reproduced the results of the paper and got 70 % and 69 % on evaluation with difficults. If you have the original implementation is basically line 415 in this script. You switch that to True.

@Jasonsun1993
Copy link

@Janezzliu I got the same problem. I wondered in which file to construct the dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants