This repository contains the implementation of the seminal paper "Attention is All You Need" by Vaswani et al. (2017), which introduced the Transformer architecture. The Transformer model revolutionized the field of deep learning by relying entirely on attention mechanisms to draw global dependencies between input and output, leading to state-of-the-art performance on tasks such as machine translation and NLP.
The main goal of this project is to recreate the core ideas of the Transformer model from scratch using PyTorch, with a focus on implementing the scaled dot-product attention and multi-head attention mechanisms. This implementation follows the paper closely, providing insights into the fundamental building blocks of the architecture.
Reference:
Original Paper: Attention is All You Need
Learning Video: Pytorch Transformers from Scratch (Attention is all you need) by Aladdin Persson