Skip to content

Latest commit

 

History

History
44 lines (24 loc) · 1.27 KB

README.md

File metadata and controls

44 lines (24 loc) · 1.27 KB

MT Exercise 5: Byte Pair Encoding, Beam Search

This repo is just a collection of scripts showing how to install JoeyNMT, download data and train & evaluate models.

Requirements

  • This only works on a Unix-like system, with bash.

  • Python 3 must be installed on your system, i.e. the command python3 must be available

  • Make sure virtualenv is installed on your system. To install, e.g.

    pip install virtualenv

Steps

Clone this repository in the desired place:

git clone https://github.com/emmavdbold/mt-exercise-5

Create a new virtualenv that uses Python 3. Please make sure to run this command outside of any virtual Python environment:

./scripts/make_virtualenv.sh

Important: Then activate the env by executing the source command that is output by the shell script above.

Download and install required software:

./scripts/download_install_packages.sh

Download data:

./download_iwslt_2017_data.sh

The data is only minimally preprocessed, so you may want to tokenize it and apply any further preprocessing steps.

Train a model:

./scripts/train.sh

The training process can be interrupted at any time, and the best checkpoint will always be saved.

Evaluate a trained model with

./scripts/evaluate.sh