An Implementation of the Encoder-Decoder model with global attention mechanism (Luong et al., 2015). This stacked multiple layers of an RNN with a Long Short-Term Memory (LSTM) are used for both the encoder and the decoder. Also, the global attention mechanism and input feeding approach are employed. In the training step, you can use schedule sampling (Bengio et al., 2015) to bridge the gap between training and inference for sequence prediction tasks.
The --train
and --valid
options receive the path to a data file for training and validation, respectively. The data file must be tab-separated values (TSV) format. If you need to use GPU, please set the --gpu
option. --tf-ratio
option means a ratio of the supervised signal in the decoding step.
python train.py \
--gpu \
--train ./sample_data/sample_train.tsv \
--valid ./sample_data/sample_valid.tsv \
--tf-ratio 0.5 \
--savedir ./checkpoints
The --model
option receives the path to a model file generated by train.py
. A text file that you want to translate is given to --input
. If you need to use GPU, please set the --gpu
option.
python translate.py \
--gpu \
--model ./checkpoints/checkpoint_best.pt \
--input ./sample_data/sample_test.txt
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention based neural machine translation. In EMNLP 2015, 1412–1421.
- Bengio, S.; Vinyals, O.; Jaitly, N.; and Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS 2015, 1171–1179.