python3 training_pipeline.py "model/2020-03-28_22-39-28/epoch0_batch11.par"
python3 gen_scores.py <model_path> <dataset_file_path.json> [optional eval freq.] [optional eval start step]
- Concrete example (training set):
python3 gen_scores.py ./model/MI1_dropout_encodings_only/ preprocessing/data/subset-1/train-subset-1.json 2000 50000
-- this will eval model at step 50000, 52000, 54000, ... up to the most recent one. - Concrete example (dev set):
python3 gen_scores.py ./model/MI1_dropout_encodings_only/ preprocessing/data/dev-v2.0.json
- The dataset file path needs to be
something.json
and have a correspondingsomething-tokenized.json
for this script to work! - The script will generate a file
scores_<datasetname>.log
in the model folder, as well as two plots (EM and F1). - To copy the plots to your computer run:
scp -T [email protected]:"'Documents/no_eating_no_drinking/model/MI1_dropout_encodings_only/plot_loss_vs_em_score(train-subset-1).png'" . && scp -T [email protected]:"'Documents/no_eating_no_drinking/model/MI1_dropout_encodings_only/plot_loss_vs_f1_score(train-subset-1).png'" . &&
(or same but withdev-v2
replacingtrain-subset-1
).
- Generate predictions on SQuAD dev set:
python3 produce_answers.py model/2020-04-01_01-07-06/epoch0_batch791.par
- Generate predictions on a different dataset:
python3 produce_answers.py model/2020-04-01_01-07-06/epoch0_batch791.par preprocessing/data/subset-1/train-subset-1-tokenized.json [optional_prediction_file_path]
- Run evaluation:
python3 evaluate-v2.0.py preprocessing/data/subset-1/train-subset-1.json predictions.json
- First generate the scores log file using
gen_scores.py
(see separate instructions for that). - Then:
python3 plot_f1_vs_loss.py model/mymodel/scores_train-subset-1.log
.
- Model: https://github.com/atulkum/co-attention/blob/master/code/model.py
- Batcher: https://github.com/atulkum/co-attention/blob/master/code/data_util/data_batcher.py
- GDrive: https://drive.google.com/drive/folders/1n5V3Je-qcuncPhkDYikIbAxJ_68d1p4E
- Colab: https://colab.research.google.com/drive/1ycVllF_XIsXDvC4qOAMr4HRTeRaEBFNb
- Add your past contributions or nearest-future work here. (everyone)
- Move tests to seperate file (Richie)
- Implement model (everyone)
- Enable cuda usage (Kuba)
- Get forward pass to run (Kuba)
- Get backward pass to run (Kuba -- this was quick)
- Debug why predicted end indices are all 0 (Richie)
- Complete batching (Asmita)
- Create word2id, id2word, embedding matrix (Asmita)
- Training pipeline (Asmita + Kuba + Richie + Dip)
- Model serialisation (Kuba + Richie)
- Debug
retain_graph
error (Dip) - Debug training issues (Dip with help from Kuba and Richie)
- Quick script to evaluate latest version of a model automatically (Kuba?)
- Ablation tests:
- single iteration for s/e indices instead of 4.
- smaller HIDDEN_DIM
- try removing some modules or replacing them with something simpler, e.g. coattention with some fully connected layers.
- Think of more ablation tests. Take ones from the paper.
- Plots:
- Automate computation of F1/EM scores throughout a model's evolution (training) (Kuba)
- Plotting F1/EM scores on top of loss (Kuba)
- Prepare loss tables (discussed in the gdoc)
- Plotting scores depending on true span length (Dip)
- Generate predictions for evaluation (TODO
batching if needed, better conversion from tokens to answer strings,load serialised model) (Dip)