This project requires python 3.8
(!)
- Install
bigbio
git clone [email protected]:bigscience-workshop/biomedical.git
cd biomedical
pip install -e .
cd ..
- Install pytorch
conda install pytorch==1.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
- Install our fork of
allennlp v 1.3
with DDP + gradient accumulation patch backported
pip install git+https://github.com/leonweber/allennlp
- Clone our fork of Machamp:
git clone https://github.com/leonweber/machamp.git
- Clone the biomuppet repository and install dependencies
git clone [email protected]:leonweber/biomuppet.git
pip install --no-deps -r biomuppet/requirements.txt
- Generate the machamp training data
cd biomuppet; bash generate_machamp_data.sh [MACHAMP_ROOT_PATH]; cd ..
7a. Run Machamp training (single node)
cd [MACHAMP_ROOT_PATH]; python train.py --dataset_config configs/bigbio_debug.json
7b. Run Machamp training (multiple nodes)
Set correct distributed settings in [MACHAMP_ROOT_PATH]/configs/params.json
:
"distributed": {
"cuda_devices": [0, 1, 2, 3], # note that all nodes have to have the same number of GPUs for AllenNLP multi node training to work
"master_address": "[ADDRESS of main node]",
"master_port": "29500", # some free port
"num_nodes": [Total number of nodes]
}
Start training:
cd [MACHAMP_ROOT_PATH]; python train.py --dataset_config configs/bigbio_debug.json --node_rank [rank of local node]