Skip to content

A lightweight educational implementation of GPT (Generative Pre-trained Transformer) using NumPy/CuPy. Features PyTorch-like syntax, GPU acceleration, and complete transformer architecture with customizable parameters. Inspired by NanoGPT. Features: πŸš€ GPU acceleration with CuPy πŸ“š Educational πŸ”§ Configurable parameters

Notifications You must be signed in to change notification settings

Vadimbuildercxx/NumpyGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NumpyGPT

NumpyGPT is a PyTorch-like implementation of a GPT (Generative Pre-trained Transformer) model using NumPy and CuPy for GPU acceleration. This project aims to provide a lightweight and educational version of the GPT architecture, suitable for learning and experimentation. Inspired by NanoGPT.

Features

  • Implements core GPT architecture components
  • Uses NumPy for CPU operations and CuPy for GPU acceleration
  • Includes modules for embedding, self attention, dropout, and layer normalization
  • Supports configurable model parameters (e.g., vocabulary size, embedding size, number of layers)
  • Provides training loop with learning rate scheduling and gradient accumulation
  • Includes text generation capabilities

Requirements

  • NumPy
  • CuPy (for GPU acceleration)

Usage

  • python3 train.py: run for start train
  • python3 sample.py: run for start inference model. Data taken from last checkpoint by default

Project Structure

The project consists of several Python files that implement different components of the GPT model:

  • model_gpu.py: Contains the main GPT model implementation
  • train.py: Contains the train code
  • utils.py: Some neccesary functions for import and export weights
  • sample.py: Inference code

Model Architecture

NumpyGPT implements a transformer-based language model with the following key components:

  • Token and position embeddings
  • Multi-head self-attention mechanism
  • Feed-forward neural networks
  • Layer normalization
  • Dropout for regularization

Training

The model supports training with:

  • Customizable learning rate scheduling
  • Gradient accumulation for effective batch size control
  • Checkpointing for saving and resuming training

About

A lightweight educational implementation of GPT (Generative Pre-trained Transformer) using NumPy/CuPy. Features PyTorch-like syntax, GPU acceleration, and complete transformer architecture with customizable parameters. Inspired by NanoGPT. Features: πŸš€ GPU acceleration with CuPy πŸ“š Educational πŸ”§ Configurable parameters

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages