-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No reduction in VRAM usage #17
Comments
Hi, We are also working on merging MatmulFreeLLM into BitBLAS examples. In the meantime, you can try Bitnet's example to achieve a similar level of VRM reduction, which should be comparable to our model. |
I see, so we would still have to wait for the repo to be fully functionally working with BitBLAS until that we can not experience the results from the paper nor do training, right? |
For training it is okay, since we have integrated triton in our current repo, so you can still enjoy the accelerated training, for inference maybe not… |
Wait, so you could still train a model and get faster training + vram reduction? It just doesn't work on inference? I might be wrong here but how would we evaluate the model during and after training for the losses, ouputs? A little bit of context, I'm wanting to train a video generative model |
Hi, The fused BitLinear in small MLPs will not be significantly accelerated with Triton. You can verify this by testing cases where in_features/out_features > 2048. Regarding Matmul-free operations: As mentioned in our paper, modern GPUs actually don't benefit from Matmul-free approaches, which is why we developed our own FPGA hardware implementation. That's why we still maintain Matmul operations in our code. Our solution is compatible with both Matmul-free and Matmul approaches. While Matmul-free can provide benefits on custom hardware implementations, retaining Matmul operations often yields better performance on general-purpose GPUs. Therefore, we use the fused version to leverage GPU training speed, while utilizing custom hardware to fully benefit from Matmul-free operations. |
I tried running the following code, with just having the ```ridger/MMfreeLM-1.3B```` model initialized:
Having another terminal opened with 'watch rocm-smi', showing 68% VRAM usage meaning about 5.5GB
Contradicting what was said in the paper?
The text was updated successfully, but these errors were encountered: