You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HF transformers implements 8 bit and 4 bit quantization. It would be nice if that feature can be leveraged for the xlm-r-xxl machine translation eval model.
Motivation
The large xlm-r-xxl model is too big for most commodity gpus. To increase access to top performance translation eval, please implement a quantize version.
Alternatives
I have seen a few libraries which quantize bert models outside the HF ecosystem.
Additional context
I tried to load the big model in 8 bit with HF, without autodevice, I could load the model, which then used 14gb vram but I don't know how to use it.
The text was updated successfully, but these errors were encountered:
🚀 Feature
HF transformers implements 8 bit and 4 bit quantization. It would be nice if that feature can be leveraged for the xlm-r-xxl machine translation eval model.
Motivation
The large xlm-r-xxl model is too big for most commodity gpus. To increase access to top performance translation eval, please implement a quantize version.
Alternatives
I have seen a few libraries which quantize bert models outside the HF ecosystem.
Additional context
I tried to load the big model in 8 bit with HF, without autodevice, I could load the model, which then used 14gb vram but I don't know how to use it.
The text was updated successfully, but these errors were encountered: