Skip to content

Latest commit

 

History

History
48 lines (37 loc) · 1.73 KB

README.md

File metadata and controls

48 lines (37 loc) · 1.73 KB

T5

The T5 model was presented in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Model based on Transformers was used. The original code can be found here.

Speedup by using FastSeq

  • Speed on single NVIDIA-V100-16GB

    BatchSize 64 128
    transformers_v4.12.0 9.5 samples/s OOM
    above + fastseq 23.3 samples/s 31.7 samples/s

Model

t5-base from Huggingface Transformers model hub.

Task - WMT16 English-Romanian Translation

download with this command:

wget https://cdn-datasets.huggingface.co/translation/wmt_en_ro.tar.gz
tar -xzvf wmt_en_ro.tar.gz
export ENRO_DIR=${PWD}/wmt_en_ro

this should make a directory called wmt_en_ro/ with 6 files.

Setting

$ fastseq-generate-for-transformers \
    t5-base \
    wmt_en_ro/val.source \
    out.summary \
    --reference_path cnn_dm/val.target \
    --device cuda \
    --bs BATCH_SIZE \
    --fp16 \
    --score_path out.score \
    --task translation_en_to_ro \
    --postprocess_workers 3 \
    --no_repeat_ngram_size 3

Baseline speed number is obtained by running Transformers v4.12.0 code.

Code Example

Refer to file.