GPT

Transformer Language Model; PyTorch implementation from scratch

It's intended to follow GPT architecture, i.e.:

decoder-only Transformer
learned positional encoding
dense multi-head attention
Normal(0, 0.02) initialization
d_mlp = 4 * d_head
embedding, residual and softmax dropouts

It's implemented as a collection of small modules, each depending on the one below it:

TransformerLM
Transformer
Decoder
MultiHeadAttention
Attention

They're all implemented in gpt/.

Tensor shapes and types are annotated with torchtyping, which is integrated with pytest and typeguard, so the types are automatically enforced in tests.

Each module except TransformerLM is tested. Tests usually just run forward / backward on a bunch of randomly shaped inputs, some of them check properties that should usually hold. Only attention has a typical unit test, actually comparing output values with ones I got on paper.

I'm yet to train it on a toy dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
gpt		gpt
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT

About

Releases

Packages

Languages

License

jettjaniak/gpt

Folders and files

Latest commit

History

Repository files navigation

GPT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages