-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Can't quantize 405B Mega merge #8528
Comments
#7359 broke models with more than 256 layers. |
Ooo I see... On purpose or as a consequence of supporting that model? Could it be patched or is it a hard limit? |
It's a consequence of keeping Making the layer-wise hparams take less space when not needed is something which I'll likely fix eventually, so that the limit only applies to models which need layer-wise hparams. |
It appears to be an arbitrary limit even though an int64 can handle an absurd 9x10^18 before overflowing. I don't know why but this seems to be something fairly unique to the machine learning space even though it makes the code needlessly brittle and user hostile. |
What happened?
Trying to quantize https://huggingface.co/TensorWave/Meta-Llama-3-405B-Instruct-Up-Merge
I was able to convert without issue, but when trying to quantize I get an annoyingly generic assert:
GGML_ASSERT: src/llama.cpp:3973: n <= N_MAX
Anything I can do to get more useful outputs or debugging?
Name and Version
b3389
What operating system are you seeing the problem on?
No response
Relevant log output
The text was updated successfully, but these errors were encountered: