-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Needs more investigation] int8_weight_only
via quantize_()
API on torch.float16
models results in NaN values across multiple CPU architectures
#1662
Comments
I can confirm this. I also noticed it the other day but did not dig deeper. If the base weights are in |
int8_weight_only
via quantize_()
API results in NaN values across multiple CPU architecturesint8_weight_only
via quantize_()
API on torch.float16
models results in NaN values across multiple CPU architectures
Thanks for the reporting this issue. I will take a look of this issue. |
It seems like a overflow issue. Hi @vmpuri @psinger did GPU meet same issue? Draft a PR to fix it: #1698
|
Note: I'll work on seeing if this reproduces with a non-torchchat example.
While working on migrating torchchat's
WeightOnlyInt8Quantizer
to AO'squantize_(model, int8_weight_only())
API, I ran into issues where values would go to NaN after a few layers if the model's dtype was initiallyfloat16
. This seems to occur across multiple platforms (tested with MPS, Mac CPU, x86 CPU), so I'm not sure if it's a hardware-specific issue.Interestingly, setting the model dtype to
bfloat16
does not encounter this error.To repro, you can check out this PR with the migration in torchchat
and run a model using:
You'll notice the model just outputs "!" tokens - representing NaN. If you add a debug hook to the model, you can identify that some values in the intermediate tensors get very close to 0 just before NaN values are detected.
The text was updated successfully, but these errors were encountered: