This repo is a work-in-progress towards the goal of a minimal implementation of transformers with multi-head self attention, for my own curiosity and deeper understanding. The model is trained and evaluated on a toy dataset where the task is to reverse a sequence of integers.
-
Clone the repository:
git clone https://github.com/naivoder/AttentionIsAllYouNeed.git cd AttentionIsAllYouNeed
-
Install the dependencies:
pip install -r requirements.txt
-
Run the training script:
python main.py
Special thanks to Aladdin Persson for his explanation of torch.einsum for the attention mechanism.