Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpQR compression method #240

Open
JianbangZ opened this issue Jun 9, 2023 · 2 comments
Open

SpQR compression method #240

JianbangZ opened this issue Jun 9, 2023 · 2 comments

Comments

@JianbangZ
Copy link

How feasible to implement spQR into ggml?
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

@gardner
Copy link

gardner commented Jun 12, 2023

The paper: https://arxiv.org/pdf/2306.03078.pdf

The code: https://github.com/Vahe1994/SpQR

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023
CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023
@PoignardAzur
Copy link

Given this comment: ggerganov/llama.cpp#1602 (comment), it seems unlikely SpQR is going to be implemented any time soon:

The main idea of the SpQR paper is to separate "outliers". This has been tried as part of k-quants development and has been shown to be less effective, see for instance ggerganov/llama.cpp#1595 (comment) in ggerganov/llama.cpp#1595).

If we read the SpQR paper more carefully, we find that what they mean by "nearly lossless compression" is to arrive at a quantized perplexity within 1% of the full model. The Q4_K_M variant of k-quants does that for ggml, see for instance PR ggerganov/llama.cpp#1684

We can probably close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants