Parameter Efficient Fine-tuning for Multi-task Learning

This is the final project for the Stanford CS 224N class adapted from here. This project aims to implement, utilize, and improve upon the BERT model to perform sentiment analysis, paraphrase detection, and semantic textual similarity. We implement Projected Attention Layers (PALs), adapters, and prefix tuning to achieve optimal performance over multi-tasks while being efficient. We also experiment with changes to the BERT model architecture by implementing Sentence-BERT and modifying the downstream classifier head architecture.

Please refer to the report for further details.

Setup instructions

Follow setup.sh to properly setup a conda environment and install dependencies.
There is a detailed description of the code structure in STRUCTURE.md, including a description of which parts you will need to implement.

Usage

python multitask_classifier.py --option [finetune/pretrain] --use_gpu\
	--output_dir OUTPUT_DIR\
    --epochs 25 --lr 1e-5 --lr_adapt 1e-4 --warmup_portion 0.1\
    --batch_size 16 --steps_per_epoch 2400 --eval_interval 4\
    --gradient_accumulation_step 1\
    --hidden_dropout_prob 0.1\
    --sample [rr, squareroot, anneal]\
    --config_path CONFIG_PATH \
    --similarity_classifier_type ['linear', 'cosine-similarity']\
    --paraphrase_classifier_type ['linear', 'cosine-similarity']\
    --pooling_type ['cls', 'mean', 'max']\
    --classification_concat_type ['naive', 'add-abs']\

Argument

Training pipeline
- --sample: sampling strategy described in Stickland and Murray (2019).
Adaptation modules
- --config_path: define adaptation modules (PAL, adapter, prefix) and their hyperparameters.
Model architecture The following arguments define various model architecture based on Sentence-BERT.
- --similarity_classifier_type, --paraphrase_classifier_type: apply linear layer or calculate cosine similarity between two representations.
- --pooling_type: use cls token, mean pooling, or max pooling.
- --classification_concat_type: whether add absolute difference term.

Result

Finetune/Pretrain	Backbone	Adaptation	Trainable Param (M)	SST	Quora	STS	Avg
Pretrain	Base BERT	-	3.0	41.1	67.5	27.2	45.3
Pretrain	Base BERT	PAL	8.4	52.0	73.9	34.9	53.6
Pretrain	Base BERT	Prefix	3.2	50.5	72.3	36.7	53.2
Pretrain	Base BERT	Adapter	10.2	50.3	75.5	33.8	53.2
Pretrain	Sentence-BERT	-	2.4	45.0	72.2	49.5	55.6
Pretrain	Sentence-BERT	PAL	7.8	48.1	73.1	74.4	65.2
Pretrain	Sentence-BERT	Prefix	2.6	47.0	77.5	72.3	65.6
Pretrain	Sentence-BERT	Adapter	9.6	43.7	76.5	68.1	62.8
Finetune	Base BERT	-	112.4	50.0	81.5	43.7	58.4
Finetune	Base BERT	PAL	117.9	50.6	79.2	47.6	59.2
Finetune	Base BERT	Prefix	112.7	49.9	82.9	53.6	62.1
Finetune	Base BERT	Adapter	119.7	51.5	81.6	46.2	59.8
Finetune	Sentence-BERT	-	111.8	49.9	82.3	75.9	69.4
Finetune	Sentence-BERT	PAL	117.3	51.2	83.4	74.6	69.7
Finetune	Sentence-BERT	Prefix	112.1	50.5	81.9	77.5	70.0
Finetune	Sentence-BERT	Adapter	119.7	50.5	79.3	72.7	67.5
Ensemble x3			53.2	83.5	78.5	71.1

Poster

Acknowledgement

The BERT implementation part of the project was adapted from the "minbert" assignment developed at Carnegie Mellon University's CS11-711 Advanced NLP, created by Shuyan Zhou, Zhengbao Jiang, Ritam Dutt, Brendon Boldt, Aditya Veerubhotla, and Graham Neubig.

Parts of the code are from the transformers library (Apache License 2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
config		config
data		data
predictions		predictions
pretrain		pretrain
scripts		scripts
.gitignore		.gitignore
CS224n_Default_Final_Project_Poster.png		CS224n_Default_Final_Project_Poster.png
CS_224N_Final_Report.pdf		CS_224N_Final_Report.pdf
LICENSE		LICENSE
README-dev.md		README-dev.md
README.md		README.md
STRUCTURE.md		STRUCTURE.md
base_bert.py		base_bert.py
bert.py		bert.py
classifier.py		classifier.py
config.py		config.py
datasets.py		datasets.py
datasets_custom.py		datasets_custom.py
ensemble.py		ensemble.py
evaluation.py		evaluation.py
get_init_emb.py		get_init_emb.py
houlsby.py		houlsby.py
multi_sentiment_classifier.py		multi_sentiment_classifier.py
multitask_classifier.py		multitask_classifier.py
optimizer.py		optimizer.py
optimizer_test.npy		optimizer_test.npy
optimizer_test.py		optimizer_test.py
pal.py		pal.py
pcgrad.py		pcgrad.py
prefix.py		prefix.py
prepare_submit.py		prepare_submit.py
requirements.txt		requirements.txt
sanity_check.data		sanity_check.data
sanity_check.py		sanity_check.py
setup.sh		setup.sh
test_multitask.py		test_multitask.py
tokenizer.py		tokenizer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parameter Efficient Fine-tuning for Multi-task Learning

Setup instructions

Usage

Argument

Result

Poster

Acknowledgement

About

Releases 1

Packages

Languages

License

cyingliu/CS224n-Efficient-MT-BERT

Folders and files

Latest commit

History

Repository files navigation

Parameter Efficient Fine-tuning for Multi-task Learning

Setup instructions

Usage

Argument

Result

Poster

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages