-
Notifications
You must be signed in to change notification settings - Fork 3
Using Adapter for BERT
pip install adapter-transformers
I strongly advise that you create another environment for this as we don't want other transformer/BERT models to be affected by this installation. FYI, what I did for this was simply copying current virtual env directory (e.g. A
) to new name you want (e.g. B
) and then modifying B/bin/activate
file.
Loading a new adapter is as follows:
import transformer
- load pretrained BERT (or RoBERTa):
transformer.add_adapter(
adapter_task_name,
config=adapter_config_name
)
where adapter_task_name
is simply a string that you assign (e.g. 'scorenn', 'tasknn') and adapter_config_name
is one of the following strings: "pfeiffer", "houlsby", "pfeiffer+inv", "houlsby+inv". For our research, these different types are not so important and we can simply use "pfeiffer".
For our usage, we want two adapters for 'tasknn' and 'scorenn' and thus we will use 'scorenn' & 'tasknn' throughout this documentation.
- In making adapters participate in the forward/backward loop, we have to activate its participation.
This is how it's usually done:
transformer.set_active_adapters(adapter_task_name)
The usual usage is using one adapter at a time, but for our usage, we want both 'scorenn' adapter and 'tasknn' adapter to participate. Thus, we can use Parallel module in adapterhub as following:
model.active_adapters = ac.Parallel(adapter1, adapter2)
The adapter-hub recently added unit tests for Parallel option upon my request (nice!).
model.train_adapter(adapter_task_name)
only makes adapter parameters related to the adapter_task_name
backpropagatable.
As we want to backpropagate to each adapter (scorenn, tasknn) in alternating fashion, we can simply do the following in the place where we used to activate scorenn
, tasknn
parameters exclusively.
# only finetune scorenn, freezing tasknn
model.train_adapter(scorenn)
...
# only finetune tasknn, freezing scorenn
model.train_adapter(tasknn)
To work with allenNLP, I've written a wrapper files
cached_transformer_adapters.py,
pretrained_transformer_adpater_embedder in the feat/adapter
branch.
This wrapper covers the basic addition, loading of adapters but does not handle parallel
functionality nor freezing/unfreezing certain adapters. The latter needs to be taken care of inside the training loop and where to perform former is a design choice.
Saving and loading adapters can be done as shown in tutorial. I found it to be odd that you don't need the adapter_task_name
(which in this example is 'sst-2') in loading the adapters, but I guess this it is what it is.
# save model
model.save_pretrained('./path/to/model/directory/')
# save adapter
model.save_adapter('./path/to/adapter/directory/', 'sst-2')
# load model
model = BertModel.from_pretrained('./path/to/model/directory/')
model.load_adapter('./path/to/adapter/directory/')