-
Notifications
You must be signed in to change notification settings - Fork 150
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #356 from janhq/chore/documentation-0.2.10
Documentation 0.2.10
- Loading branch information
Showing
7 changed files
with
75 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
title: GBNF Grammar | ||
description: What Nitro supports | ||
keywords: [Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama] | ||
--- | ||
|
||
## GBNF Grammar | ||
|
||
GBNF (GGML BNF) makes it easy to set rules for how a model talks or writes. Think of it like teaching the model to always speak correctly, whether it's in emoji or proper JSON format. | ||
|
||
Bakus-Naur Form (BNF) is a way to describe the rules of computer languages, files, and how they talk to each other. GBNF builds on BNF, adding modern features similar to those found in regular expressions. | ||
|
||
In GBNF, we create rules (production rules) to guide how a model forms its responses. These rules use a mix of fixed characters (like letters or emojis) and flexible parts that can change. Each rule follows a format: `nonterminal ::= sequence...`. | ||
|
||
To get a clearer picture, check out [this guide](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md). | ||
|
||
## Use GBNF Grammar in Nitro | ||
|
||
To make your Nitro model follow specific speaking or writing rules, use this command: | ||
|
||
```bash title="Nitro Inference With Grammar" {10} | ||
curl http://localhost:3928/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": "Who won the world series in 2020?" | ||
}, | ||
], | ||
"grammar_file": "/path/to/grammarfile" | ||
}' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
title: Self extend | ||
description: Self-Extend LLM Context Window Without Tuning | ||
keywords: [long context, longlm, Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama] | ||
--- | ||
|
||
## Enhancing LLMs with Self-Extend | ||
Self-Extend offers an innovative approach to increase the context window of Large Language Models (LLMs) without the usual need for re-tuning. This method adapts the attention mechanism during the inference phase and eliminates the necessity for additional training or fine-tuning. | ||
|
||
For in-depth technical insights, refer to their research [paper](https://arxiv.org/pdf/2401.01325.pdf). | ||
|
||
## Activating Self-Extend for LLMs | ||
|
||
To activate the Self-Extend feature while loading your model, use the following command: | ||
|
||
```bash title="Enable Self-Extend" {6,7} | ||
curl http://localhost:3928/inferences/llamacpp/loadmodel \ | ||
-H 'Content-Type: application/json' \ | ||
-d '{ | ||
"llama_model_path": "/path/to/your_model.gguf", | ||
"ctx_len": 8192, | ||
"grp_attn_n": 4, | ||
"grp_attn_w": 2048, | ||
}' | ||
``` | ||
|
||
**Note:** | ||
- For optimal performance, `grp_attn_w` should be as large as possible, but smaller than the training context length. | ||
- Setting `grp_attn_n` between 2 to 4 is recommended for peak efficiency. Higher values may result in increased incoherence in output. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters