Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support quantized models #812

Open
tharvik opened this issue Oct 24, 2024 · 0 comments
Open

support quantized models #812

tharvik opened this issue Oct 24, 2024 · 0 comments
Labels
feature New feature or request

Comments

@tharvik
Copy link
Collaborator

tharvik commented Oct 24, 2024

currently, we use pretty much float32 tensors all around, which yields pretty huge models.
after discussion with @martinjaggi, training is hard to do without float32, but inference can probably utilize uint8 tensors, dividing up to 4x the size of trained models.

note: check that the model is still behaving correctly after quantization

@tharvik tharvik added the feature New feature or request label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant