Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

复现时的问题 #16

Open
1125690278 opened this issue Jul 14, 2020 · 8 comments
Open

复现时的问题 #16

1125690278 opened this issue Jul 14, 2020 · 8 comments

Comments

@1125690278
Copy link

你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?

@autoliuweijie
Copy link
Owner

你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?

你蒸馏时的speed设为多少,这么看上去像是speed=0.0, 导致所有样本都走到主干的最后一层。

@1125690278
Copy link
Author

你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?

你蒸馏时的speed设为多少,这么看上去像是speed=0.0, 导致所有样本都走到主干的最后一层。

speed 为0.5 用的就是你提供的脚本

@autoliuweijie
Copy link
Owner

你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?

你蒸馏时的speed设为多少,这么看上去像是speed=0.0, 导致所有样本都走到主干的最后一层。

speed 为0.5 用的就是你提供的脚本

麻烦把执行的命令和print到终端的结果贴出来看一看哈。

@1125690278
Copy link
Author

你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?

你蒸馏时的speed设为多少,这么看上去像是speed=0.0, 导致所有样本都走到主干的最后一层。

speed 为0.5 用的就是你提供的脚本

麻烦把执行的命令和print到终端的结果贴出来看一看哈。
脚本
CUDA_VISIBLE_DEVICES="0" python -u run_fastbert.py
--pretrained_model_path ./models/chinese_bert_base.bin
--vocab_path ./models/google_zh_vocab.txt
--train_path ./datasets/douban_book_review/train.tsv
--dev_path ./datasets/douban_book_review/dev.tsv
--epochs_num 3 --batch_size 32 --distill_epochs_num 5
--encoder bert --fast_mode --speed 0.5
--output_model_path ./models/douban_book_review_fastbert.bin
结果
Epoch id: 3, backbone fine-tuning steps: 100, Avg loss: 0.593
Epoch id: 3, backbone fine-tuning steps: 200, Avg loss: 0.462
Epoch id: 3, backbone fine-tuning steps: 300, Avg loss: 0.493
Epoch id: 3, backbone fine-tuning steps: 400, Avg loss: 0.451
Epoch id: 3, backbone fine-tuning steps: 500, Avg loss: 0.452
Epoch id: 3, backbone fine-tuning steps: 600, Avg loss: 0.449
The number of evaluation instances: 9811
Fast mode: False
Number of model parameters: 85198850.0
FLOPs per sample in average: 10892624128.0
Acc. (Correct/Total): 0.7755 (7608/9811)
Start self-distillation for student-classifiers.
Epoch id: 1, self-distillation steps: 100, Avg loss: 0.532
Epoch id: 1, self-distillation steps: 200, Avg loss: 0.058
Epoch id: 1, self-distillation steps: 300, Avg loss: 0.040
Epoch id: 1, self-distillation steps: 400, Avg loss: 0.033
Epoch id: 1, self-distillation steps: 500, Avg loss: 0.029
Epoch id: 1, self-distillation steps: 600, Avg loss: 0.028
The number of evaluation instances: 9811
Fast mode: True
Number of model parameters: 87192600.0
FLOPs per sample in average: 7352265517.297727
Acc. (Correct/Total): 0.7755 (7608/9811)
Epoch id: 2, self-distillation steps: 100, Avg loss: 0.031
Epoch id: 2, self-distillation steps: 200, Avg loss: 0.023
Epoch id: 2, self-distillation steps: 300, Avg loss: 0.021
Epoch id: 2, self-distillation steps: 400, Avg loss: 0.022
Epoch id: 2, self-distillation steps: 500, Avg loss: 0.022
Epoch id: 2, self-distillation steps: 600, Avg loss: 0.022
The number of evaluation instances: 9811
Fast mode: True
Number of model parameters: 87192600.0
FLOPs per sample in average: 7641473334.97829
Acc. (Correct/Total): 0.7755 (7608/9811)
Epoch id: 3, self-distillation steps: 100, Avg loss: 0.025
Epoch id: 3, self-distillation steps: 200, Avg loss: 0.019
Epoch id: 3, self-distillation steps: 300, Avg loss: 0.019
Epoch id: 3, self-distillation steps: 400, Avg loss: 0.017
Epoch id: 3, self-distillation steps: 500, Avg loss: 0.018
Epoch id: 3, self-distillation steps: 600, Avg loss: 0.019
The number of evaluation instances: 9811
Fast mode: True
Number of model parameters: 87192600.0
FLOPs per sample in average: 7627017668.168383
Acc. (Correct/Total): 0.7755 (7608/9811)
Epoch id: 4, self-distillation steps: 100, Avg loss: 0.023
Epoch id: 4, self-distillation steps: 200, Avg loss: 0.019
Epoch id: 4, self-distillation steps: 300, Avg loss: 0.018
Epoch id: 4, self-distillation steps: 400, Avg loss: 0.018
Epoch id: 4, self-distillation steps: 500, Avg loss: 0.017
Epoch id: 4, self-distillation steps: 600, Avg loss: 0.017
The number of evaluation instances: 9811
Fast mode: True
Number of model parameters: 87192600.0
FLOPs per sample in average: 7627017668.168383
Acc. (Correct/Total): 0.7755 (7608/9811)
Epoch id: 5, self-distillation steps: 100, Avg loss: 0.023
Epoch id: 5, self-distillation steps: 200, Avg loss: 0.018
Epoch id: 5, self-distillation steps: 300, Avg loss: 0.018
Epoch id: 5, self-distillation steps: 400, Avg loss: 0.018
Epoch id: 5, self-distillation steps: 500, Avg loss: 0.018
Epoch id: 5, self-distillation steps: 600, Avg loss: 0.018
The number of evaluation instances: 9811
Fast mode: True
Number of model parameters: 87192600.0
FLOPs per sample in average: 7627017668.168383
Acc. (Correct/Total): 0.7755 (7608/9811)

@autoliuweijie
Copy link
Owner

从self-distilation的效果来看,确实是FLOPs下降,而Acc不变。

但是这个Acc在Book review数据集上差了很多,请确保./models/chinese_bert_base.bin是正确的?以及使用的是python3吗

@1125690278
Copy link
Author

从self-distilation的效果来看,确实是FLOPs下降,而Acc不变。

但是这个Acc在Book review数据集上差了很多,请确保./models/chinese_bert_base.bin是正确的?以及使用的是python3吗

从self-distilation的效果来看,确实是FLOPs下降,而Acc不变。

但是这个Acc在Book review数据集上差了很多,请确保./models/chinese_bert_base.bin是正确的?以及使用的是python3吗

确认没错的 都是按你的链接下载的

@autoliuweijie
Copy link
Owner

从self-distilation的效果来看,确实是FLOPs下降,而Acc不变。
但是这个Acc在Book review数据集上差了很多,请确保./models/chinese_bert_base.bin是正确的?以及使用的是python3吗

从self-distilation的效果来看,确实是FLOPs下降,而Acc不变。
但是这个Acc在Book review数据集上差了很多,请确保./models/chinese_bert_base.bin是正确的?以及使用的是python3吗

确认没错的 都是按你的链接下载的

可以试试Pypi版本的:https://github.com/autoliuweijie/FastBERT/tree/master/pypi

@NovemberSun
Copy link

请问这个问题有解决吗?我的实验中self-distilation和主干网络的最后一个epoch结果不变,但是self-distilation过程中第5个epoch到第10个epoch的准确率都不变

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants