This project implements a transformer-based machine translation model using PyTorch. The model is designed to translate text from one language to another, demonstrating the power of attention mechanisms in sequence-to-sequence tasks.
- Implements a full transformer architecture for translation
- Includes positional encoding for sequence order preservation
- Uses multi-head attention mechanism
- Supports customizable model parameters (layers, heads, etc.)
- Includes a basic tokenization system
- Python 3.7+
- PyTorch 1.7+
- NumPy
-
Clone the repository:
git clone https://github.com/frost-head/translation.git cd translation
-
Install the required packages:
pip install torch numpy
To use the pre-trained weights for this model:
-
Download the weights from this Google Drive link.
-
Create a
checkpoints
folder in the root directory of the project if it doesn't exist:mkdir checkpoints
-
Move the downloaded weight files into the
checkpoints
folder.
To train the model:
python main.py
To use the pre-trained weights, make sure they are in the checkpoints
folder before running the script.
The transformer model consists of:
- An encoder stack with multiple layers of self-attention and feed-forward networks
- A decoder stack with self-attention, encoder-decoder attention, and feed-forward networks
- Multi-head attention mechanism for capturing different aspects of the input
- Positional encoding to maintain sequence order information
You can adjust various hyperparameters in main.py
to experiment with different model configurations:
d_model
: Dimension of the modelnhead
: Number of attention headsnum_encoder_layers
andnum_decoder_layers
: Number of layers in encoder and decoderdim_feedforward
: Dimension of the feed-forward network
Contributions to improve the model or extend its functionality are welcome. Please feel free to submit pull requests or open issues for any bugs or feature requests.
This project is open-source and available under the MIT License.
- This implementation is inspired by the original "Attention Is All You Need" paper by Vaswani et al.
- Thanks to the PyTorch team for their excellent framework and documentation.