Name	Name	Last commit message	Last commit date
Latest commit thxCode test: adjust Aug 8, 2024 4938734 · Aug 8, 2024 History 73 Commits
.github/workflows	.github/workflows	chore: bump go version	Aug 6, 2024
cmd/gguf-parser	cmd/gguf-parser	refactor: adjust cli	Aug 7, 2024
util	util	fix: failed to call cli	Aug 8, 2024
.gitattributes	.gitattributes	feat: first commit	May 29, 2024
.gitignore	.gitignore	feat: first commit	May 29, 2024
.golangci.yaml	.golangci.yaml	feat: first commit	May 29, 2024
Dockerfile	Dockerfile	fix: perm on bin	Aug 1, 2024
LICENSE	LICENSE	feat: first commit	May 29, 2024
Makefile	Makefile	ci: support deps udpate	Aug 7, 2024
README.md	README.md	docs: readme	Jul 30, 2024
cache.go	cache.go	chore: accpet by gpustack	Jul 25, 2024
file.go	file.go	chore: accpet by gpustack	Jul 25, 2024
file_architecture.go	file_architecture.go	feat: detect embedding	Jul 30, 2024
file_architecture_test.go	file_architecture_test.go	refactor: simplify estimate	Jun 7, 2024
file_estimate.go	file_estimate.go	refactor: batch size	Aug 6, 2024
file_estimate_option.go	file_estimate_option.go	refactor: estimate embedding model	Aug 6, 2024
file_estimate_test.go	file_estimate_test.go	refactor: estimate	Jun 12, 2024
file_from_distro.go	file_from_distro.go	fix: failed to call cli	Aug 8, 2024
file_from_remote.go	file_from_remote.go	chore: accpet by gpustack	Jul 25, 2024
file_model.go	file_model.go	feat: support new quantization	Jul 11, 2024
file_model_test.go	file_model_test.go	refactor: simplify estimate	Jun 7, 2024
file_option.go	file_option.go	fix: failed to call cli	Aug 8, 2024
file_test.go	file_test.go	test: adjust	Aug 8, 2024
file_tokenizer.go	file_tokenizer.go	feat: display eom/eot in tokenizer	Aug 5, 2024
file_tokenizer_test.go	file_tokenizer_test.go	refactor: simplify estimate	Jun 7, 2024
filename.go	filename.go	chore: accpet by gpustack	Jul 25, 2024
filename_test.go	filename_test.go	chore: accpet by gpustack	Jul 25, 2024
gen.go	gen.go	feat: first commit	May 29, 2024
gen.stringer.go	gen.stringer.go	feat: first commit	May 29, 2024
ggml.go	ggml.go	refactor: padding context size	Jul 11, 2024
go.mod	go.mod	fix: failed to parse ollama model	Aug 7, 2024
go.sum	go.sum	fix: failed to parse ollama model	Aug 7, 2024
ollama_model.go	ollama_model.go	fix: failed to parse ollama model	Aug 7, 2024
ollama_model_test.go	ollama_model_test.go	feat: parse ollama model	Jul 4, 2024
ollama_registry_authenticate.go	ollama_registry_authenticate.go	fix: failed to call cli	Aug 8, 2024
zz_generated.ggmltype.stringer.go	zz_generated.ggmltype.stringer.go	feat: support new quantization	Jul 11, 2024
zz_generated.gguffiletype.stringer.go	zz_generated.gguffiletype.stringer.go	feat: support new quantization	Jul 11, 2024
zz_generated.ggufmagic.stringer.go	zz_generated.ggufmagic.stringer.go	feat: first commit	May 29, 2024
zz_generated.ggufmetadatavaluetype.stringer.go	zz_generated.ggufmetadatavaluetype.stringer.go	feat: first commit	May 29, 2024
zz_generated.ggufversion.stringer.go	zz_generated.ggufversion.stringer.go	feat: first commit	May 29, 2024

Name

Last commit message

Last commit date

thxCode

test: adjust

Aug 8, 2024

4938734 · Aug 8, 2024

73 Commits

.github/workflows

chore: bump go version

Aug 6, 2024

cmd/gguf-parser

refactor: adjust cli

Aug 7, 2024

util

fix: failed to call cli

Aug 8, 2024

May 29, 2024

May 29, 2024

May 29, 2024

Aug 1, 2024

May 29, 2024

ci: support deps udpate

Aug 7, 2024

README.md

docs: readme

Jul 30, 2024

cache.go

chore: accpet by gpustack

Jul 25, 2024

file.go

chore: accpet by gpustack

Jul 25, 2024

file_architecture.go

feat: detect embedding

Jul 30, 2024

file_architecture_test.go

refactor: simplify estimate

Jun 7, 2024

file_estimate.go

refactor: batch size

Aug 6, 2024

file_estimate_option.go

refactor: estimate embedding model

Aug 6, 2024

file_estimate_test.go

refactor: estimate

Jun 12, 2024

file_from_distro.go

fix: failed to call cli

Aug 8, 2024

file_from_remote.go

chore: accpet by gpustack

Jul 25, 2024

file_model.go

feat: support new quantization

Jul 11, 2024

file_model_test.go

refactor: simplify estimate

Jun 7, 2024

file_option.go

fix: failed to call cli

Aug 8, 2024

file_test.go

test: adjust

Aug 8, 2024

file_tokenizer.go

feat: display eom/eot in tokenizer

Aug 5, 2024

file_tokenizer_test.go

refactor: simplify estimate

Jun 7, 2024

filename.go

chore: accpet by gpustack

Jul 25, 2024

filename_test.go

chore: accpet by gpustack

Jul 25, 2024

May 29, 2024

May 29, 2024

refactor: padding context size

Jul 11, 2024

go.mod

fix: failed to parse ollama model

Aug 7, 2024

go.sum

fix: failed to parse ollama model

Aug 7, 2024

ollama_model.go

fix: failed to parse ollama model

Aug 7, 2024

ollama_model_test.go

feat: parse ollama model

Jul 4, 2024

ollama_registry_authenticate.go

fix: failed to call cli

Aug 8, 2024

zz_generated.ggmltype.stringer.go

feat: support new quantization

Jul 11, 2024

zz_generated.gguffiletype.stringer.go

feat: support new quantization

Jul 11, 2024

zz_generated.ggufmagic.stringer.go

feat: first commit

May 29, 2024

zz_generated.ggufmetadatavaluetype.stringer.go

feat: first commit

May 29, 2024

zz_generated.ggufversion.stringer.go

feat: first commit

May 29, 2024

GGUF Parser

tl;dr, Go parser for the GGUF.

GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML.

GGUF Parser provides some functions to parse the GGUF file in Go for the following purposes:

Read metadata from the GGUF file without downloading the whole model remotely.
Estimate the model usage.

Import the package as below.

go get github.com/gpustack/gguf-parser-go

If you need one-shot command-line, try gguf-parser from releases or go install github.com/gpustack/gguf-parser-go/cmd/gguf-parser from HEAD.

Calls

flowchart
    parseGGUFFileRemote[/parseGGUFFileRemote/]
    parseGGUFFile[/parseGGUFFile/]
    ParseGGUFFile -.-> parseGGUFFile
    ParseGGUFFileFromHuggingFace -.-> ParseGGUFFileRemote
    ParseGGUFFileFromModelScope -.-> ParseGGUFFileRemote
    ParseGGUFFileRemote -.-> parseGGUFFileRemote
    parseGGUFFileRemote -.-> parseGGUFFile
    ParseGGUFFileFromOllama -.-> ParseGGUFFileFromOllamaModel
    ParseGGUFFileFromOllamaModel -.-> parseGGUFFileRemote

Examples

Load model

import (
    "github.com/davecgh/go-spew/spew"
    . "github.com/gpustack/gguf-parser-go"
)

f, err := ParseGGUFFile("path/to/model.gguf")
if err != nil {
    panic(err)
}

spew.Dump(f)

Use MMap

f, err := ParseGGUFFile("path/to/model.gguf", UseMMap())
if err != nil {
    panic(err)
}

Skip large metadata

f, err := ParseGGUFFile("path/to/model.gguf", SkipLargeMetadata())
if err != nil {
    panic(err)
}

Load model from remote

import (
    "context"
    "github.com/davecgh/go-spew/spew"
    . "github.com/gpustack/gguf-parser-go"
)

f, err := ParseGGUFFileRemote(context.Background(), "https://example.com/model.gguf")
if err != nil {
    panic(err)
}

spew.Dump(f)

Adjust requesting buffer size

f, err := ParseGGUFFileRemote(context.Background(), "https://example.com/model.gguf", UseBufferSize(1 * 1024 * 1024) /* 1M */)
if err != nil {
    panic(err)
}

View information

// Model
spew.Dump(f.Model())

// Architecture
spew.Dump(f.Architecture())

// Tokenizer
spew.Dump(f.Tokenizer())

Estimate usage in llama.cpp

The evaluation result is close to those run with llama-cli(examples/main/main.cpp).

es := f.EstimateLLaMACppUsage()
spew.Dump(es)

// Since the estimated result is detail and lack of context,
// you can summarize the result as below.
s := es.Summarize(true /* load via mmap */, 0, 0 /* no unified memory RAM, VRAM footprint */)
spew.Dump(s)

Estimate with larger prompt

es := f.EstimateLLaMACppUsage(WithContextSize(4096) /* Use 4k context */))
spew.Dump(es)

// Since the estimated result is detail and lack of context,
// you can summarize the result as below.
s := es.Summarize(true /* load via mmap */, 0, 0 /* no unified memory RAM, VRAM footprint */)
spew.Dump(s)

Estimate with specific offload layers

es := f.EstimateLLaMACppUsage(WithOffloadLayers(10) /* Offload last 10 layers to GPU */))
spew.Dump(es)

// Since the estimated result is detail and lack of context,
// you can summarize the result as below.
s := es.Summarize(true /* load via mmap */, 0, 0 /* no unified memory RAM, VRAM footprint */)
spew.Dump(s)

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GGUF Parser

Calls

Examples

Load model

Use MMap

Skip large metadata

Load model from remote

Adjust requesting buffer size

View information

Estimate usage in llama.cpp

Estimate with larger prompt

Estimate with specific offload layers

License

About

Releases 67

Languages

License

gpustack/gguf-parser-go

Folders and files

Latest commit

History

Repository files navigation

GGUF Parser

Calls

Examples

Load model

Use MMap

Skip large metadata

Load model from remote

Adjust requesting buffer size

View information

Estimate usage in llama.cpp

Estimate with larger prompt

Estimate with specific offload layers

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 67

Languages