Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiDE - sp30k - nl4 #4

Open
PiotrCzapla opened this issue Sep 24, 2018 · 9 comments
Open

WikiDE - sp30k - nl4 #4

PiotrCzapla opened this issue Sep 24, 2018 · 9 comments
Assignees

Comments

@PiotrCzapla
Copy link
Member

No description provided.

@PiotrCzapla
Copy link
Member Author

pretrain_lm

BS=192
nl=4
cuda=0
python fastai_scripts/pretrain_lm.py --dir-path "${dir}" --cuda-id $cuda --cl 12 --bs "${BS}" --lr 0.01 --pretrain-id "nl-${nl}-small-minilr" --sentence-piece-model sp.model --nl "${nl}"
train_lm(dir_path=work/wiki30k, cuda_id=0, cl=12, bs=192, backwards=False, lr=0.01, sampled=True, pretrain_id=nl-4-small-minilr, sentence_piece_model=sp.model, batch_sets=1, em_sz=400, nh=1150, nl=4)
Tokens to words fraction: 1.5994601551349397
Epoch:   0%|                                                                                 | 0/12 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      4.480352   3.744285   0.38241
Epoch:   8%|█████▌                                                            | 1/12 [1:49:32<20:05:00, 6572.73s/it    1      4.350872   3.591883   0.399484
Epoch:  17%|███████████                                                       | 2/12 [3:39:14<18:16:10, 6577.02s/it    2      4.273034   3.508802   0.409233
Epoch:  25%|████████████████▌                                                 | 3/12 [5:28:39<16:25:57, 6573.03s/it    3      4.213831   3.438031   0.417689
Epoch:  33%|██████████████████████                                            | 4/12 [7:18:12<14:36:25, 6573.17s/it    4      4.169942   3.398202   0.424298
Epoch:  42%|███████████████████████████▌                                      | 5/12 [9:07:43<12:46:48, 6572.70s/it    5      4.129915   3.357917   0.429928
Epoch:  50%|████████████████████████████████▌                                | 6/12 [10:57:33<10:57:33, 6575.64s/it    6      4.082751   3.310802   0.436389
Epoch:  58%|██████████████████████████████████████▌                           | 7/12 [12:47:13<9:08:00, 6576.15s/it    7      4.027577   3.271201   0.442223
Epoch:  67%|████████████████████████████████████████████                      | 8/12 [14:36:56<7:18:28, 6577.09s/it    8      3.9906     3.236399   0.448125
Epoch:  75%|█████████████████████████████████████████████████▌                | 9/12 [16:26:45<5:28:55, 6578.40s/it    9      3.938214   3.208448   0.454097
Epoch:  83%|██████████████████████████████████████████████████████▏          | 10/12 [18:16:27<3:39:17, 6578.71s/it    10     3.904736   3.183524   0.459883
Epoch:  92%|███████████████████████████████████████████████████████████▌     | 11/12 [20:06:02<1:49:38, 6578.39s/it    11     3.852059   3.159713   0.465222
Epoch: 100%|███████████████████████████████████████████████████████████████████| 12/12 [21:55:27<00:00, 6577.32s/it]
(fastai) pczapla@galatea ~/w/ulmfit4de ❯❯❯```

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 24, 2018

$ python fastai_scripts/infer.py --dir-path "${dir}" --cuda-id $cuda --bs 16\                              ✘ 1
    --pretrain-id "nl-${nl}-small-minilr" --sentence-piece-model sp.model \
    --test_set tmp/val_ids.npy --correct_for_up=False --nl  "${nl}"
infer(dir_path=work/wiki30k, test_set=tmp/val_ids.npy, cuda_id=0, bs=16, pretrain_id=nl-4-small-minilr, sentence_piece_model=sp.model, correct_for_up=False, limit=None, em_sz=400, nh=1150, nl=4, use_tqdm=True)
27002: dir_path work/wiki30k; cuda_id 0; bs 16; limit: None; pretrain_id nl-4-small-minilr_ em_sz 400 nh 1150 nl 4
27002: {'tokens_total': 12489514, 'subword_tokens_total': 19976480, 'oov': 0, 'vs': 30000}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41344/41344 [28:09<00:00, 24.48it/s]
27002: Cross entropy: 5.061666488647461, Perplexity: 157.85336303710938```

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 24, 2018

GE17

predir=work/wiki30k
destdir=work/wikige2017
BS=128
cuda=0
nl=4
python ./fastai_scripts/finetune_lm.py --dir-path "${destdir}" --pretrain-path "${predir}" --cuda-id $cuda \
    --cl 6 --pretrain-id "nl-${nl}-small-minilr" --lm-id "nl-${nl}-finetune" --bs $BS --lr 0.001 \
    --use_discriminative True --dropmult 0.5 --sentence-piece-model sp.model --sampled True --nl "${nl}"

train_lm(dir_path=work/wikige2017, pretrain_path=work/wiki30k, cuda_id=0, cl=6, pretrain_id=nl-4-small-minilr, lm_id=nl-4-finetune, bs=128, dropmult=0.5, backwards=False, lr=0.001, preload=True, bpe=False, startat=0, use_clr=True, use_regular_schedule=False, use_discriminative=True, notrain=False, joined=False, train_file_id=, early_stopping=False, sentence_piece_model=sp.model, sampled=True, batch_sets=1, em_sz=400, nh=1150, nl=4)
Loading work/wikige2017/tmp/trn_ids.npy and work/wikige2017/tmp/val_ids.npy
Tokens to words fraction: 1.778323439377108
Loading LM weights ( work/wiki30k/models/fwd_nl-4-small-minilr.h5 )...
Epoch:   0%|                                                                                                                       | 0/6 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      5.04466    4.308868   0.332463
Epoch:  17%|██████████████████▎                                                                                           | 1/6 [01:47<08:58, 107.63s/it    1      4.637154   4.101497   0.361402
Epoch:  33%|████████████████████████████████████▋                                                                         | 2/6 [03:35<07:11, 107.81s/it    2      4.494651   4.023696   0.371602
Epoch:  50%|███████████████████████████████████████████████████████                                                       | 3/6 [05:24<05:24, 108.12s/it    3      4.420981   3.986701   0.376534
Epoch:  67%|█████████████████████████████████████████████████████████████████████████▎                                    | 4/6 [07:14<03:37, 108.71s/it    4      4.356632   3.955569   0.380808
Epoch:  83%|███████████████████████████████████████████████████████████████████████████████████████████▋                  | 5/6 [09:03<01:48, 108.76s/it    5      4.319008   3.950476   0.381265
Epoch: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [10:51<00:00, 108.49s/it]

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 24, 2018

predir=work/wiki30k
destdir=work/wikige2017
BS=40
cuda=0
nl=4
python ./fastai_scripts/train_clas.py --dir-path="$destdir" --cuda-id=$cuda \
    --lm-id="nl-${nl}-finetune" --clas-id="nl-${nl}-v1"\
    --bs=$BS --cl=5 --lr=0.001 --dropmult 0.5 --sentence-piece-model='sp.model' \
    --nl $nl --use_discriminative True
dir_path work/wikige2017; cuda_id 0; lm_id nl-4-finetune; clas_id nl-4-v1; bs 40; cl 5; backwards False; dropmult 0.5 unfreeze True startat 0; bpe False; use_clr True;use_regular_schedule False; use_discriminative True; last False;chain_thaw False; from_scratch False; train_file_id
INFO: training set len 20941 divided by 20 is 1, removing that last batch of 1 to avoid exceptions

it/sepoch      trn_loss   val_loss   accuracy
    0      0.640876   0.597957   0.753096
    0      0.623045   0.575396   0.764706
Epoch:   0%|                                                                                                                       | 0/5 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      0.574206   0.548167   0.776316
    1      0.562006   0.51578    0.784056
    2      0.446546   0.517741   0.794892
    3      0.446296   0.499478   0.792183
    4      0.393079   0.512004   0.789861

Evaluation

$ destdir=work/wikige2017
BS=120
cuda=0
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-v1" --bs=$BS --nl $nl
Loading work/wikige2017/models/fwd_nl-4-v1_clas_1.h5
Test file: test1
F1 score: 0.765003897116134
Confusion matrix
 [[1457  206   18]
 [ 258  488   34]
 [  72   15   18]]
(0.765003897116134, array([[1457,  206,   18],        [ 258,  488,   34],        [  72,   15,   18]]))

$ destdir=work/wikige2017
BS=120
cuda=0
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-v1" --bs=$BS --nl $nl --test-file test2
Loading work/wikige2017/models/fwd_nl-4-v1_clas_1.h5
Test file: test2
F1 score: 0.7812160694896851
Confusion matrix
 [[1053  157   27]
 [ 137  353    7]
 [  51   24   33]]
(0.7812160694896851, array([[1053,  157,   27],        [ 137,  353,    7],        [  51,   24,   33]]))

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 25, 2018

GE18 - CAT

 predir=work/wiki30k                                   ✘ 130
destdir=work/wiki-ge2018-cat
BS=128
cuda=2
nl=4
python ./fastai_scripts/finetune_lm.py --dir-path "${destdir}" --pretrain-path "${predir}" --cuda-id $cuda \
    --cl 6 --pretrain-id "nl-${nl}-small-minilr" --lm-id "nl-${nl}-finetune" --bs $BS --lr 0.001 \
    --use_discriminative True --dropmult 0.5 --sentence-piece-model sp.model --sampled True --nl "${nl}" --train-file-id all


train_lm(dir_path=work/wiki-ge2018-cat, pretrain_path=work/wiki30k, cuda_id=2, cl=6, pretrain_id=nl-4-small-minilr, lm_id=nl-4-finetune, bs=128, dropmult=0.5, backwards=False, lr=0.001, preload=True, bpe=False, startat=0, use_clr=True, use_regular_schedule=False, use_discriminative=True, notrain=False, joined=False, train_file_id=all, early_stopping=False, sentence_piece_model=sp.model, sampled=True, batch_sets=1, em_sz=400, nh=1150, nl=4)
Loading work/wiki-ge2018-cat/tmp/trn_ids_all.npy and work/wiki-ge2018-cat/tmp/val_ids.npy
Tokens to words fraction: 2.0697860962566845
Loading LM weights ( work/wiki30k/models/fwd_nl-4-small-minilr.h5 )...
Epoch:   0%|                                                                      | 0/6 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      5.325815   4.266432   0.356295
Epoch:  17%|██████████▎                                                   | 1/6 [00:13<01:09, 13.85s/it    1      5.058677   3.946336   0.391423
Epoch:  33%|████████████████████▋                                         | 2/6 [00:27<00:55, 13.77s/it    2      4.886464   3.778683   0.409474
Epoch:  50%|███████████████████████████████                               | 3/6 [00:41<00:41, 13.85s/it    3      4.773062   3.68521    0.421213
Epoch:  67%|█████████████████████████████████████████▎                    | 4/6 [00:55<00:27, 13.96s/it    4      4.681145   3.634211   0.427184
Epoch:  83%|███████████████████████████████████████████████████▋          | 5/6 [01:09<00:13, 13.97s/it    5      4.616828   3.61728    0.42883
Epoch: 100%|██████████████████████████████████████████████████████████████| 6/6 [01:23<00:00, 13.97s/it]```

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 25, 2018

BS=80
python ./fastai_scripts/train_clas.py --dir-path="$destdir" --cuda-id=$cuda \
    --lm-id="nl-${nl}-finetune" --clas-id="nl-${nl}-v1"\
    --bs=$BS --cl=5 --lr=0.001 --dropmult 0.5 --sentence-piece-model='sp.model' \
    --nl $nl --use_discriminative True

dir_path work/wiki-ge2018-cat; cuda_id 2; lm_id nl-4-finetune; clas_id nl-4-v1; bs 80; cl 5; backwards False; dropmult 0.5 unfreeze True startat 0; bpe False; use_clr True;use_regular_schedule False; use_discriminative True; last False;chain_thaw False; from_scratch False; train_file_id
Epoch:   0%|                                                                      | 0/1 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      1.135915   1.015744   0.663626
Epoch: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.88s/it]
Epoch:   0%|                                                                      | 0/1 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      0.919945   0.856781   0.693349
Epoch: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.65s/it]
Epoch:   0%|                                                                      | 0/5 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      0.807964   0.778169   0.703355
Epoch:  20%|████████████▍                                                 | 1/5 [00:16<01:04, 16.18s/it    1      0.688997   0.765653   0.703061
Epoch:  40%|████████████████████████▊                                     | 2/5 [00:31<00:47, 15.88s/it    2      0.641219   0.75945    0.708064
Epoch:  60%|█████████████████████████████████████▏                        | 3/5 [00:46<00:31, 15.76s/it    3      0.596655   0.758918   0.713361
Epoch:  80%|█████████████████████████████████████████████████▌            | 4/5 [01:03<00:15, 15.88s/it    4      0.568349   0.774852   0.712184
Epoch: 100%|██████████████████████████████████████████████████████████████| 5/5 [01:18<00:00, 15.66s/it]
Plotting lrs...```

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 25, 2018

GE18 - BIN

predir=work/wiki30k
destdir=work/wiki-ge2018-bin
BS=128
cuda=2
nl=4
python ./fastai_scripts/finetune_lm.py --dir-path "${destdir}" --pretrain-path "${predir}" --cuda-id $cuda \
    --cl 6 --pretrain-id "nl-${nl}-small-minilr" --lm-id "nl-${nl}-finetune" --bs $BS --lr 0.001 \
    --use_discriminative True --dropmult 0.5 --sentence-piece-model sp.model --sampled True --nl "${nl}" \
    --train-file-id all
train_lm(dir_path=work/wiki-ge2018-bin, pretrain_path=work/wiki30k, cuda_id=2, cl=6, pretrain_id=nl-4-small-minilr, lm_id=nl-4-finetune, bs=128, dropmult=0.5, backwards=False, lr=0.001, preload=True, bpe=False, startat=0, use_clr=True, use_regular_schedule=False, use_discriminative=True, notrain=False, joined=False, train_file_id=all, early_stopping=False, sentence_piece_model=sp.model, sampled=True, batch_sets=1, em_sz=400, nh=1150, nl=4)
Loading work/wiki-ge2018-bin/tmp/trn_ids_all.npy and work/wiki-ge2018-bin/tmp/val_ids.npy
Tokens to words fraction: 2.095647472289808
Loading LM weights ( work/wiki30k/models/fwd_nl-4-small-minilr.h5 )...
Epoch:   0%|                                                                      | 0/6 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      5.334776   4.308463   0.354172
Epoch:  17%|██████████▎                                                   | 1/6 [00:11<00:56, 11.22s/it    1      5.063684   4.022039   0.383898
Epoch:  33%|████████████████████▋                                         | 2/6 [00:22<00:44, 11.23s/it    2      4.887977   3.880775   0.399305
Epoch:  50%|███████████████████████████████                               | 3/6 [00:34<00:34, 11.41s/it    3      4.767955   3.80927    0.408304
Epoch:  67%|█████████████████████████████████████████▎                    | 4/6 [00:45<00:22, 11.45s/it    4      4.677004   3.771819   0.412356
Epoch:  83%|███████████████████████████████████████████████████▋          | 5/6 [00:57<00:11, 11.52s/it    5      4.618461   3.765508   0.413374
Epoch: 100%|██████████████████████████████████████████████████████████████| 6/6 [01:09<00:00, 11.61s/it]```
destdir=work/wiki-ge2018-bin
cuda=2
nl=4
BS=80
python ./fastai_scripts/train_clas.py --dir-path="$destdir" --cuda-id=$cuda \
    --lm-id="nl-${nl}-finetune" --clas-id="nl-${nl}-v1"\
    --bs=$BS --cl=5 --lr=0.001 --dropmult 0.5 --sentence-piece-model='sp.model' \
    --nl $nl --use_discriminative True
dir_path work/wiki-ge2018-bin; cuda_id 2; lm_id nl-4-finetune; clas_id nl-4-v1; bs 80; cl 5; backwards False; dropmult 0.5 unfreeze True startat 0; bpe False; use_clr True;use_regular_schedule False; use_discriminative True; last False;chain_thaw False; from_scratch False; train_file_id
Epoch:   0%|                                                                      | 0/1 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      0.567768   0.570559   0.703631
Epoch: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.06s/it]
Epoch:   0%|                                                                      | 0/1 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      0.533525   0.562706   0.71737
Epoch: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.03s/it]
Epoch:   0%|                                                                      | 0/5 [00:00<?, ?it/sepoch      trn_loss   val_loss   accuracy
    0      0.508378   0.538856   0.724239
Epoch:  20%|████████████▍                                                 | 1/5 [00:13<00:55, 13.86s/it    1      0.468444   0.553419   0.729146
Epoch:  40%|████████████████████████▊                                     | 2/5 [00:27<00:41, 13.89s/it    2      0.44237    0.567014   0.724239
Epoch:  60%|█████████████████████████████████████▏                        | 3/5 [00:41<00:27, 13.79s/it    3      0.40841    0.563597   0.73209
Epoch:  80%|█████████████████████████████████████████████████▌            | 4/5 [00:54<00:13, 13.74s/it    4      0.357289   0.570082   0.737978
Epoch: 100%|██████████████████████████████████████████████████████████████| 5/5 [01:08<00:00, 13.69s/it]
Plotting lrs...

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 25, 2018

destdir=work/wiki-ge2018-bin
BS=120
cuda=1
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-v1" --bs=$BS --nl $nl  --test-file test --classes 2
Loading work/wiki-ge2018-bin/models/fwd_nl-4-v1_clas_1.h5
Test file: test
F1 score micro avg: 0.7613301942319011
F1 score macro avg: 0.6999382079881011
Precision macro avg: 0.7528976051279876
Recall macro avg: 0.6868907628036516
Special F1 macro* avg: 0.7183811479771038
Binary
F1 score bin: 0.5642127888232134
Precision: 0.7383966244725738
Recall: 0.45652173913043476
Confusion matrix
 [[2062  186]
 [ 625  525]]

@PiotrCzapla
Copy link
Member Author

PiotrCzapla commented Sep 26, 2018

I had few attempts to break the 0.71 score using 30k all of the failed:

  • finetuning on a larger set of tweets (including btw17)
  • training for longer periods of time cl = 6, 12
  • adding more dropout (dropoutmul set to 0.8 and 1.0)
    I haven't tired ensembling any models.

Here are some evaluations:
Training from LM finetuned for 12 epoches on btw17+ge18

destdir=work/wiki-ge2018-bin             
BS=120
cuda=1
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-6-more12" --bs=$BS --nl $nl  --test-file test --classes 2
Loading work/wiki-ge2018-bin/models/fwd_nl-4-6-more12_clas_1.h5
Test file: test
F1 score micro avg: 0.7530900529723367
F1 score macro avg: 0.6918314952072595
Precision macro avg: 0.738247301562112
Recall macro avg: 0.680025916756924
Special F1 macro* avg: 0.7079415891856333
Binary
F1 score bin: 0.5544344131704726
Precision: 0.7121418826739427
Recall: 0.4539130434782609
Confusion matrix
 [[2037  211]
 [ 628  522]]

Training from LM finetuned for 6 epoches on btw17+ge18

destdir=work/wiki-ge2018-bin                        
BS=120
cuda=1
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-6-more6" --bs=$BS --nl $nl  --test-file test --classes 2
zsh: command not found: ✘
Loading work/wiki-ge2018-bin/models/fwd_nl-4-6-more6_clas_1.h5
Test file: test
F1 score micro avg: 0.7448499117127723
F1 score macro avg: 0.678137961662567
Precision macro avg: 0.7289398533902254
Recall macro avg: 0.6674272783537057
Special F1 macro* avg: 0.6968286940758577
Binary
F1 score bin: 0.5316045380875203
Precision: 0.7018544935805991
Recall: 0.42782608695652175
Confusion matrix
 [[2039  209]
 [ 658  492]]

Training for 12 epoch with dropout 1.0 from ft_16 (16 epochs finetune dropout 1.0)

destdir=work/wiki-ge2018-bin
BS=120
cuda=1
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-v3" --bs=$BS --nl $nl  --test-file test --classes 2 --squeeze-bin True
Loading work/wiki-ge2018-bin/models/fwd_nl-4-v3_clas_1.h5
Converting mutli classification in to binary classificaiton
Test file: test
F1 score micro avg: 0.7616244849911713
F1 score macro avg: 0.7089119038594307
Precision macro avg: 0.7444673178379168
Recall macro avg: 0.6968818660064985
Special F1 macro* avg: 0.7198890864904286
Binary
F1 score bin: 0.5850409836065574
Precision: 0.7119700748129676
Recall: 0.4965217391304348
Confusion matrix
 [[2017  231]
 [ 579  571]]

Training for 12 epoch with dropout 1.0 from ft_16 (16 epochs finetune dropout 1.0)

destdir=work/wiki-ge2018-bin
BS=120
cuda=1
nl=4
python ./ulmfit/evaluate.py --dir-path="$destdir" --cuda-id=$cuda \
    --clas-id="nl-${nl}-v2" --bs=$BS --nl $nl  --test-file test --classes 2 --squeeze-bin True
Loading work/wiki-ge2018-bin/models/fwd_nl-4-v2_clas_1.h5
Converting mutli classification in to binary classificaiton
Test file: test
F1 score micro avg: 0.7536786344908769
F1 score macro avg: 0.700575545370007
Precision macro avg: 0.7324490664051442
Recall macro avg: 0.6896023518489866
Special F1 macro* avg: 0.7103802187725357
Binary
F1 score bin: 0.5744789018810371
Precision: 0.6915544675642595
Recall: 0.49130434782608695
Confusion matrix
 [[1996  252]
 [ 585  565]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants