Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mAP NAN while training with custom dataset #1765

Open
YCAyca opened this issue Mar 27, 2024 · 2 comments
Open

mAP NAN while training with custom dataset #1765

YCAyca opened this issue Mar 27, 2024 · 2 comments

Comments

@YCAyca
Copy link

YCAyca commented Mar 27, 2024

Hello, I try to train YOLOX-X using my custom dataset in COCO format. While it was fine with a small version of my dataset (~3K images) using the default settings in yolox_base.py (only batch size set to 4, due to the lack of GPU memory), when I train it using the big version of my dataet (~10K) I cant find any solution to prevent getting NAN for each class, since the first epoch. I have tried many things:

  • Decreasing self.basic_lr_per_img by 10
  • Decreasing self.basic_lr_per_img by 100
  • Decreasing epoch number to 100, to 75 (thinking that the yolox cosine warmup learning rate scheduler gets arranged according to the total iteration number, and since my iteration per epoch is 3x more now, decreasing max epoch could...)
  • Using multi GPU so that I can put batch size = 16, which gives me very similar iteration number per epoch with my previous training

NONE of them worked. I don't know what to do else, is anyone have any idea?
Apart from that, I have checked my COCO format labels in the platform that I used to convert my labels, and they all seem fine. But maybe in YOLOX dataloader something is wrong, how can I visualize my ground truths easily after loading a batch in YOLOX training???

@YCAyca
Copy link
Author

YCAyca commented Mar 27, 2024

Another weird thing is that all the losses are 0 since the first iteration, except conf_loss which becomes directly the total loss:
image

@benoitboidin
Copy link

How many samples are there in your validation dataset? A common way to get NaN values is not to have enough samples in each classe to perform an evaluation.

For instance, if your validation dataset contains 0 "car", your "car" mAP will always beNaN. But also, if you have too few car, the validation batch may not contain any (since YOLOX use a random subset of this dataset for each epoch evaluation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants