Training Time Issue #83

imethanlee · 2022-03-15T04:11:04Z

Hi,

What is the expected time to train PET model on yelp_full dataset (with default arguments)? I started the training the day before yesterday with a RTX 3090 GPU and it is still running.

Thanks.

timoschick · 2022-03-30T19:47:37Z

I don't know how efficient RTX 3090's are, but with a single Nvidia Geforce 1080Ti, training PET (not iPET) with the default parameters is a matter of a few hours. In case you haven't fixed the issue yourself yet, could you provide me with the exact command that you've used to train the model? Also, did you check (e.g., with nvidia-smi) whether the GPU is actually used?

jmcrey · 2022-04-19T22:02:49Z

Hi @timoschick,

I am having the same issue here. I started the training on a RTX 3090 yesterday and it is still running. The command I am using is as follows:

python pet/cli.py \
    --method pet \
    --pattern_ids 0 3 5 \
    --data_dir ${DATA_DIR} \
    --model_type albert \
    --model_name_or_path albert-xxlarge-v2 \
    --task_name boolq \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --pet_per_gpu_eval_batch_size 8 \
    --pet_per_gpu_train_batch_size 2 \
    --pet_gradient_accumulation_steps 8 \
    --pet_max_steps 250 \
    --pet_max_seq_length 256 \
    --pet_repetitions 3 \
    --sc_per_gpu_train_batch_size 2 \
    --sc_per_gpu_unlabeled_batch_size 2 \
    --sc_gradient_accumulation_steps 8 \
    --sc_max_steps 5000 \
    --sc_max_seq_length 256 \
    --sc_repetitions 1

jmcrey · 2022-04-20T01:19:34Z

Just a heads up -- I bumped up the version of PyTorch to 1.8.0 and CUDA to 11.3 and that solved the performance issues. I am now able to run through the first 126 epochs in about 12 minutes compared to 1.5 hours. I am still waiting to see if this affects the results, but the performance is much better.

jacksonchen1998 · 2023-02-16T12:29:50Z

@jmcrey So, the result is ok ?

I'm now use 1080 Ti and trained with CUDA 11.5 and having TensorRT with 3 epoch.
My pre-trained model is Roberta-large and the dataset is AG News, other's arguments set to default.
It's looks like the training time needs to take half a day.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Time Issue #83

Training Time Issue #83

imethanlee commented Mar 15, 2022

timoschick commented Mar 30, 2022

jmcrey commented Apr 19, 2022 •

edited

Loading

jmcrey commented Apr 20, 2022

jacksonchen1998 commented Feb 16, 2023 •

edited

Loading

Training Time Issue #83

Training Time Issue #83

Comments

imethanlee commented Mar 15, 2022

timoschick commented Mar 30, 2022

jmcrey commented Apr 19, 2022 • edited Loading

jmcrey commented Apr 20, 2022

jacksonchen1998 commented Feb 16, 2023 • edited Loading

jmcrey commented Apr 19, 2022 •

edited

Loading

jacksonchen1998 commented Feb 16, 2023 •

edited

Loading