-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imatrix : use GGUF to store importance matrices #9400
Draft
compilade
wants to merge
13
commits into
master
Choose a base branch
from
compilade/imatrix-batched-chunks
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+479
−172
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* perplexity : simplify filling the batch
compilade
added
enhancement
New feature or request
breaking change
Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility.
refactoring
Refactoring
examples
python
python script changes
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
labels
Sep 10, 2024
ngxson
reviewed
Sep 10, 2024
Sums and counts tensors no longer need to be consecutive. * imatrix : more sanity checks when loading multiple imatrix files * imatrix : use ggml_format_name instead of std::string concatenation Co-authored-by: Xuan Son Nguyen <[email protected]>
I'm setting this to "draft", because of concerns by @ikawrakow in ikawrakow/ik_llama.cpp#15 (comment) and ikawrakow/ik_llama.cpp#15 (comment) (mostly related to the fact that GGUF is harder to parse than More details near the end of ikawrakow/ik_llama.cpp#15 (reply in thread). I'll need some days to think about how to go further with this. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
breaking change
Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility.
enhancement
New feature or request
examples
python
python script changes
refactoring
Refactoring
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow-up from ikawrakow/ik_llama.cpp#15 (reply in thread).
Using GGUF as the format for
imatrix
files will be useful for further experiments (e.g. with L²QER) and compatibility with existing or future GGUF tooling (e.g. GGUF previews on HuggingFace, graphical GGUF viewer(s) #6715, some kind ofgguf-diff
, etc.).There are multiple problems with
imatrix
which this is addressing:unordered_map
iteration order (makessha256sum
useless to compareimatrix
files made on the same dataset)-ub
(intermediate saves happen waaay too often)Summary of changes
imatrix
data.general.type
isimatrix
general.architecture
imatrix
files.*.sums
and*.counts
for each tensors with imatrix data.*.sums
are the sums of activationsF32
, like before.*.counts
are the number of activations (also the number of tokens), useful to calculate the meanimatrix
files together with--in-file
.F32
even though it's integer values, because when calculating the mean it would be converted toF32
anyway to perform the division.convert_legacy_imatrix_to_gguf.py
to convert oldimatrix.dat
files toimatrix.gguf
llama-perplexity
since perplexity : support using multiple sequences to allow larger batch sizes #5946, allow computing multiple chunks per batch withllama-imatrix
std::fma
) when accumulating the sums of activationsf64
would be even better, but I'm not use it's worth it yet. For the curious, usingdouble
for the intermediate accumulations can be tried by changing only one line inIMatrixStats
:vector<float> values
tovector<double> values
.)unordered_map
.sha256sum
can be meaningfully used to compareimatrix
files generated in very similar conditions.TODO
llama-quantize
with oldimatrix.dat
with newllama-quantize
using convertedimatrix.gguf
sha256sum
.llama-imatrix
at different batch sizes-ub 64 -b 512
and-ub 512 -b 2048
for a chunk size of 512 (-c 512
)llama-imatrix
vs newllama-imatrix
--in-file
withllama-imatrix
general.architecture
exclusion.self.add_architecture()
a no-op, but maybegeneral.architecture
should simply be excluded whenself.arch == ""
. Not sure how to prevent using the otherself.add_*
(inGGUFWriter
) which expectself.arch
to be something.imatrix.dat
files?