Skip to content

Latest commit

 

History

History
62 lines (50 loc) · 4.82 KB

README.md

File metadata and controls

62 lines (50 loc) · 4.82 KB

This repository is a motivation to implement LLMs from Scratch using PyTorch. The repository is inspired by the Hugging Face Transformers, The Annotated Transformer, and The Illustrated Transformer repositories. The goal is to understand the architecture and the implementation details of the LLMs. The repository is a work in progress and will be updated regularly.

Table of Contents

Introduction

The LLMs are a type of neural network that is trained to predict the next word in a sentence given the previous words. The LLMs are used in various NLP tasks such as text generation, machine translation, and sentiment analysis. The LLMs are based on the Transformer architecture, which is a type of neural network that uses self-attention mechanism to process the input sequence. It is composed of an encoder and a decoder, which are used to process the input sequence and generate the output sequence, respectively. The LLMs are trained using a large corpus of text data, which is used to learn the patterns in the text data and generate the output sequence.

Architecture

The LLMs are based on the Transformer architecture, which is composed of an encoder and a decoder, which are used to process the input sequence and generate the output sequence, respectively. The encoder is used to process the input sequence and generate a representation of the input sequence, which is used by the decoder to generate the output sequence. The encoder and decoder are composed of multiple layers of self-attention mechanism, which is used to process the input sequence and generate the output sequence. The self-attention mechanism is used to compute the attention weights between the input sequence and the output sequence, which are used to generate the output sequence.

Implementation

The implementation of the LLMs is done using PyTorch, which is a popular deep learning library in Python. The implementation is based on the Transformer architecture, which is a type of neural network that uses self-attention mechanism to process the input sequence. The implementation is done in a modular way, which allows for easy customization of the architecture and the training process. The implementation is inspired by the The Annotated Transformer, and The Illustrated Transformer, which provide a detailed explanation of the Transformer architecture and its implementation details.

Usage

The repository contains the implementation of the LLMs in PyTorch. The implementation is done in a modular way, which allows for easy customization of the architecture and the training process. The repository contains the following files:

  • transformer.py: Contains the implementation of the Transformer architecture.
  • train.py: Contains the training script for the LLMs.
  • generate.py: Contains the generation script for the LLMs.

To train the LLMs, run the following command:

python train.py

To generate text using the trained LLMs, run the following command:

python generate.py

References

Blogs

Papers

Repositories

Courses