Skip to content
This repository has been archived by the owner on Oct 19, 2022. It is now read-only.

训练模型无法收敛 #6

Closed
yjzst opened this issue Jul 9, 2020 · 21 comments
Closed

训练模型无法收敛 #6

yjzst opened this issue Jul 9, 2020 · 21 comments

Comments

@yjzst
Copy link

yjzst commented Jul 9, 2020

我训练了MobileFaceNet但是,效果不好,完全按照您提供的方式对数据集划分,损失最终只收敛在了4左右,达不到您提供的24.pth的那个效果?请问我还有啥疏漏的地方吗

@siriusdemon
Copy link
Owner

训练了多久?

@yjzst
Copy link
Author

yjzst commented Jul 9, 2020 via email

@siriusdemon
Copy link
Owner

您可以试试权重初始化。用kaiming或者xavier。150个epoch肯定是足够了的。

@yjzst
Copy link
Author

yjzst commented Jul 9, 2020 via email

@yjzst
Copy link
Author

yjzst commented Jul 9, 2020 via email

@siriusdemon
Copy link
Owner

这个仓库的代码我后面都没有改过的呀。参考这个,你现在模型的准确度有多高?

@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@siriusdemon
Copy link
Owner

你有改动训练的参数吗?我现在还不知道问题在哪

@siriusdemon siriusdemon changed the title 模型效果问题 训练模型无法收敛 Jul 10, 2020
@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@siriusdemon
Copy link
Owner

这个问题有个诡异。因为模型没有使用权重初始化,所以问题有可能出在这里。但从您的反馈来看,似乎不是。也许您可以加大 batch_size 试试。我正在重新用默认配置训练,稍后看看是否有问题。也看看其他使用者的反馈如何。

@siriusdemon siriusdemon pinned this issue Jul 10, 2020
@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@Linsongrong
Copy link

hi ,我也出现了收敛不了的问题。我训练了150个epoch,loss从11开始,一直在8和9之间震荡,收敛不了。是学习率的问题吗。您代码里面的lr=0.1,会不会太大了。

@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@Linsongrong
Copy link

Linsongrong commented Jul 10, 2020 via email

@siriusdemon
Copy link
Owner

siriusdemon commented Jul 10, 2020

我用默认配置训练,第0个epoch结束的时候,就已经有 80 % 左右的精确率了。

Test Model: checkpoints/0.pth
Accuracy: 0.829
Threshold: 0.481

Test Model: checkpoints/3.pth
Accuracy: 0.884
Threshold: 0.451

Test Model: checkpoints/5.pth
Accuracy: 0.912
Threshold: 0.400

第6个epoch的损失Loss

Epoch 6/150, Loss: 10.049524307250977

项目中提供的24.pth 是训练了 24 个 epoch 之后的权重文件。

@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@siriusdemon
Copy link
Owner

训练中如果有足够的GPU,建议可以加大batch_size,训练20个epoch左右。如果要效果好一些的,建议:

  • 用更强的模型
  • 训练更长的时间(需要综合考量 batch_size 和 学习率)

@yjzst
Copy link
Author

yjzst commented Jul 10, 2020 via email

@Comedian1926
Copy link

我也遇到了类似的问题,在尝试过bs=128、256、512,对应学习率0.1、0.01、0.001后在lfw上最好的acc是94.5

@yjzst
Copy link
Author

yjzst commented Aug 7, 2020 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants