-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EFL 迁移到mmdetection精度低 #27
Comments
补充实验: 所以,是efl的泛化性不够么? |
Hi @EthanChen1234 , Actually, the unbearably low performance (3.2 mAP) of your code on mmdet may come from some missing details during the code migration process. So I highly recommend you to check the gradient collection function. If you use the hook mechanism to collect the gradient, you need to put the hook on the last layer except the classifier. If you calculate the gradient manually, please check carefully of your derivatives formula. If you still have some difficulties, please feel free to ask questions here. And better to provide your gradient collection function to help us locate the problem. Additional, the log file you provided is not in the UTF-8 format, and we could not open it. |
@waveboo 感谢你的快速回复!!! 1、梯度收集hook 2、EFL实现(梯度收集实现) 3、训练日志 4、损失观察 |
Hi @EthanChen1234, I check your hook position, and it seems true.
|
@waveboo @Joker-co 在训练过程中,发现正样本梯度比负样本梯度大,self.pos_neg 一直为1(EFL实际变为了FL) 这个不符合预期吧 |
Hi @EthanChen1234 , I have check your training log. The positive gradient is truly greater than negative gradient at most times. It identifies that EFL is equivalent to FL. Your experiment demonstrate two things:
|
@waveboo 这是mmdet的训练日志。 |
@EthanChen1234 |
我之前也在 mmdet 上用过 efl,我记得里面的 self.pos_neg 也基本是为 1 的 |
@FL77N 您好,我在mmdet上复现还是有些问题,您可以帮我review下么? |
您好,请问我在mmdet使用了您复现的loss代码。在我训练的过程中出现memory一直增大的现象,指导out of memory,请问如何解决? |
hi,你好,请问你解决这个问题了么? |
@EthanChen1234 |
@shiyuanyu123 倒序是为了subnet设计的,就比如ATSS有五层的输出,forward的时候,EFL把这些输出对应的gt label按从前到后的顺序concat起来,用cache_target记录了一下,在backward的时候,五层的输出对应了五个hook,这些hook都会调用efl的collect_grad,调用顺序是五层的倒序(类似于栈的调用过程),所以,当EFL统计到五层的时候,会将grad_buffer的顺序反转,以便和cache_target记录的gt一一对应。 |
@waveboo |
@shiyuanyu123 对,第一次迭代时,累积梯度比是初始化为1的,这个时候的EFL等价于常规的FL,随着后面的迭代的进行,EFL会根据累积正梯度和累积负梯度不断重新计算梯度比,分配学习比重 |
@waveboo |
hi,你好!
EFL在迁到mmdetection中,私有化数据集,仅将原生的FocalLoss修改为EqualizedFocalLoss,mAP 0.032 比原生 mAP 0.425低很多。
在迁移过程中,有些疑问:
1、模型差异,包括:
UP中的retinanet模型,有RFS、iou_branch_loss、采用hand_craft生成anchor、atss正负样本分配。
mmdet中,bbox回归采用L1 Loss、每个位置生成9个anchor、maxIouAssigner正负样本分配。
2、正负样本梯度比观察
self.pos_neg = torch.clamp(self.pos_grad / (self.neg_grad + 1e-10), min=0, max=1)
print(self.pos_neg)
log.txt.tar.gz
训练过程中,不同类别正负梯度比变化很小,说明EFL并没有起到作用?
EFL的实现、超参等,已经检查多遍。
帮忙分析下,是什么问题?
The text was updated successfully, but these errors were encountered: