-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gguf-dump.py: add --markdown dump output #7853
Conversation
FYI the context of this, was I was trying to understand the GGUF file standard a bit more. I also attempted to get the dump to print as a graphviz dotfile... but it was quite a pain to get working. However markdown is quite easy so this seems like a good compromise. The main benefit I can see however is in automatically grouping all the blocks together. |
aff4168
to
f15ce9f
Compare
c0439a4
to
e38b649
Compare
@compilade updated the script with your suggestion also updated the example output with example output of a 'phi-2.Q6_K.gguf - GGUF Internal File Dump' dump so you can see how it looks now. |
@compilade looks good thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still things about sub-column alignment which could be improved, although at least now the column bars are always aligned across rows.
Adjusted ### <a name="blk_1">Block 1 Tensor Group : ~79M Elements</a>
| T_ID | Tensor Layer Name | Human Friendly Tensor Layer Name | Elements | Shape | Type |
|-----:|:-------------------------|:----------------------------------------|:-----------------|:----------------------|:-----|
| 11 | blk.1.attn_norm.bias | Block 1 Attention Normalization (B) | ( ~3K) 2560 | 2560 x 1 x 1 x 1 | F32 |
| 12 | blk.1.attn_norm.weight | Block 1 Attention Normalization (W) | ( ~3K) 2560 | 2560 x 1 x 1 x 1 | F32 |
| 13 | blk.1.attn_qkv.bias | Block 1 Attention Query-Key-Value (B) | ( ~8K) 7680 | 7680 x 1 x 1 x 1 | F32 |
| 14 | blk.1.attn_qkv.weight | Block 1 Attention Query-Key-Value (W) | ( ~20M) 19660800 | 2560 x 7680 x 1 x 1 | Q6_K |
| 15 | blk.1.attn_output.bias | Block 1 Attention Output (B) | ( ~3K) 2560 | 2560 x 1 x 1 x 1 | F32 |
| 16 | blk.1.attn_output.weight | Block 1 Attention Output (W) | ( ~7M) 6553600 | 2560 x 2560 x 1 x 1 | Q6_K |
| 17 | blk.1.ffn_up.bias | Block 1 Feed-Forward Network "Up" (B) | ( ~10K) 10240 | 10240 x 1 x 1 x 1 | F32 |
| 18 | blk.1.ffn_up.weight | Block 1 Feed-Forward Network "Up" (W) | ( ~26M) 26214400 | 2560 x 10240 x 1 x 1 | Q6_K |
| 19 | blk.1.ffn_down.bias | Block 1 Feed-Forward Network "Down" (B) | ( ~3K) 2560 | 2560 x 1 x 1 x 1 | F32 |
| 20 | blk.1.ffn_down.weight | Block 1 Feed-Forward Network "Down" (W) | ( ~26M) 26214400 | 10240 x 2560 x 1 x 1 | Q6_K |
- Total elements in blk.1: (~79M) 78671360
- Percentage of total elements: 2.83% If you have no other thoughts @compilade feel free to press merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think a generalized approach to sub-column alignment might be appropriate. At least now it's only the Elements
column which has problematic alignment.
@compilade included your recommendation. How is it now? |
Co-authored-by: compilade <[email protected]>
7dc405b
to
51032b1
Compare
rebased with no modification to sync against latest master to deal with ci issue |
Will merge soon as I've addressed all of @compilade outstanding points and ci has passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table formatting seems good in more cases. Some minor things to fix regarding consistency of the printed text, then this will be good to merge.
Co-authored-by: compilade <[email protected]>
This will allow
$gguf-dump.py Tinyllama-5M-v0.2-Q8_0.gguf --markdown
to output a markdown formatted dump that is designed to be as easy as possible to read as a markdown file.It's sent to stdout so you can do stuff like
$gguf-dump.py Tinyllama-5M-v0.2-Q8_0.gguf --markdown | mdless
to render the markdown dump directly. Alternatively it might be part of your workflow to render this file when creating a new gguf.Why do this when you can still manually dump it whenever? Well in a github / huggingface repo, it still might be good courtesy to have an easy to read technical dump of the layers. Note that I also added a function that reads the tensor name and converts it into a human friendly name as devs coming across it may not necessarily know what ffn etc.. means.
Below is how it currently renders. Feel free to suggest changes to it to make it as useful as possible for you.
Example Output