For something in between a pytorch and a karpathy/micrograd
This may not be the best deep learning framework, but it is a deep learning framework.
The sub 1000 line core of it is in tinygrad/
Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. Support the simple basic ops, and you get SOTA vision models/efficientnet.py
and language models/transformer.py
models.
We are working on support for the Apple Neural Engine and the Google TPU in the accel/
folder. Eventually, we will build custom hardware for tinygrad, and it will be blindingly fast. Now, it is slow.
This project is maintained by tiny corp.
python3 -m pip install git+https://[email protected]/geohot/tinygrad.git
# or
git clone https://github.com/geohot/tinygrad.git
cd tinygrad
python3 -m pip install -e .
There's a lot of interest in tinygrad lately. Here's some guidelines for contributing:
- Bugfixes are the best and always welcome! Like this one.
- If you don't understand the code you are changing, don't change it!
- All code golf PRs will be closed, but conceptual cleanups are great.
- Features are welcome. Though if you are adding a feature, you need to include tests.
- Improving test coverage is great, with reliable non brittle tests.
from tinygrad.tensor import Tensor
x = Tensor.eye(3, requires_grad=True)
y = Tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()
print(x.grad.numpy()) # dz/dx
print(y.grad.numpy()) # dz/dy
import torch
x = torch.eye(3, requires_grad=True)
y = torch.tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()
print(x.grad) # dz/dx
print(y.grad) # dz/dy
Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.
DEBUG=3 OPTLOCAL=1 python3 -c "from tinygrad.tensor import Tensor;
N = 1024; a, b = Tensor.randn(N, N), Tensor.randn(N, N);
c = (a.reshape(N, 1, N) * b.permute(1,0).reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
Change to DEBUG=4
to see the generated code.
It turns out, a decent autograd tensor library is 90% of what you need for neural networks. Add an optimizer (SGD, Adam, AdamW implemented) from tinygrad.nn.optim, write some boilerplate minibatching code, and you have all you need.
from tinygrad.tensor import Tensor
import tinygrad.nn.optim as optim
class TinyBobNet:
def __init__(self):
self.l1 = Tensor.uniform(784, 128)
self.l2 = Tensor.uniform(128, 10)
def forward(self, x):
return x.dot(self.l1).relu().dot(self.l2).log_softmax()
model = TinyBobNet()
optim = optim.SGD([model.l1, model.l2], lr=0.001)
# ... and complete like pytorch, with (x,y) data
out = model.forward(x)
loss = out.mul(y).mean()
optim.zero_grad()
loss.backward()
optim.step()
tinygrad supports GPUs through PyOpenCL.
from tinygrad.tensor import Tensor
(Tensor.ones(4,4).gpu() + Tensor.ones(4,4).gpu()).cpu()
hlops are syntactic sugar around mlops. They support most things torch does.
mlops are mid level ops. They understand derivatives. They are very simple.
Relu, Log, Exp, Sin # unary ops
Sum, Max # reduce ops (with axis argument)
Maximum, Add, Sub, Mul, Pow, Div, Equal # binary ops (no broadcasting, use expand)
Expand, Reshape, Permute, Pad, Shrink, Flip # movement ops
You no longer need to write mlops for a new accelerator
The autodiff stuff is all in mlops now so you can focus on the raw operations
Buffer # class of memory on this device
unary_op (NOOP, EXP, LOG, CAST, SIN) # A -> A
reduce_op (SUM, MAX) # A -> B (smaller size, B has 1 in shape)
binary_op (ADD, SUB, MUL, DIV, POW, CMPEQ, MAX) # A + A -> A (all the same size)
movement_op (EXPAND, RESHAPE, PERMUTE, PAD, SHRINK, STRIDE) # A -> B (different size)
fused_op [[optional]] (MULACC) # A * A -> B
Despite being tiny, tinygrad supports the full EfficientNet. Pass in a picture to discover what it is.
python3 examples/efficientnet.py https://media.istockphoto.com/photos/hen-picture-id831791190
Or, if you have a webcam and cv2 installed
python3 examples/efficientnet.py webcam
PROTIP: Set "DEBUG=2" environment variable if you want to see why it's slow.
You might need to download the weight of Stable Diffusion and put it into weights/
Run python3 examples/stable_diffusion.py
"a horse sized cat eating a bagel"
After putting the weights in weights/LLaMA, you can have a chat with Stacy. She lives inside tinygrad.
python3 examples/llama.py
See examples/mnist_gan.py
See examples/yolov3.py
GRAPH=1 python3 test/models/test_mnist.py TestMNIST.test_sgd_onestep
# requires dot, outputs /tmp/net.svg
For more examples on how to run the full test suite please refer to the CI workflow.
python3 -m pip install -e '.[testing]'
python3 -m pytest
python3 -m pytest -v -k TestTrain
python3 ./test/models/test_train.py TestTrain.test_efficientnet