From 86a43fea680d6fce6d8b8eb931437fa230af7e47 Mon Sep 17 00:00:00 2001
From: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
Date: Sat, 3 Aug 2024 17:26:17 +0200
Subject: [PATCH] Models: Add Phi-3-mini-128k-instruct

Resolves https://github.com/nomic-ai/gpt4all/issues/2668

Adds model support for [Phi-3-mini-128k-instruct](https://huggingface.co/GPT4All-Community/Phi-3-mini-128k-instruct)

### Description of Model

At the date of writing, the model has strong results in benchmarks (for its parameter size). It claims to support a context of up to 128K.

- The model was trained/finetuned on English
- License: MIT

### Personal Impression:
For 3.8 billion parameters, the model has reasonable output. It is possible to converse and follow tasks. I have held a conversation that held 24k characters and even at that long of a context, it still was able to answer "what is 2x2?" correctly, albeit the responses understandably slightly degrade at that context size. I have seen refusals when it was tasked with certain things and it seems to be finetuned with a particular alignment. Its long context and quality of responses makes it a good model, if you can bear its alignment or your use case happens to fall within the originally intended use cases of the model. It mainly will appeal to English speaking users.

### Critique:

This model does not support Grouped Query Attention, that means other models that support GQA may need less RAM/VRAM for the same amount of tokens in the context window. It has been claimed that llama-3-8b (which supports GQA) needs less RAM after a certain point (\~ 8k context).

### Motivation for this pull-request

- The model is small and fits into 3GB of VRAM or 4GB of RAM respectively (I set 8GB of RAM as minimum, as the Operating System and other Apps also need some)
- The model claims long context and it delivers (although with high RAM usage in longer conversations).
- AFAIK, apart from the Qwen1.5 and Qwen2 model family, this is the only generic purpose model family below 4B parameters that delivers that large of a context window and that is also compatible with GPT4All
- For it's size it is high on the huggingface open leaderboard benchmark
- Made by Microsoft, the model has a reputation
- Users were asking for this model


## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] I have added thorough documentation for my code.
- [x] I have tagged PR with relevant project labels. I acknowledge that a PR without labels may be dismissed.
- [ ] If this PR addresses a bug, I have provided both a screenshot/video of the original bug and the working solution.

Signed-off-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
---
 gpt4all-chat/metadata/models3.json | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
diff --git a/gpt4all-chat/metadata/models3.json b/gpt4all-chat/metadata/models3.json
index 392d5baae96e..b445c671f8aa 100644
--- a/gpt4all-chat/metadata/models3.json
+++ b/gpt4all-chat/metadata/models3.json
@@ -402,5 +402,21 @@
     "url": "https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GGUF/resolve/main/qwen2-1_5b-instruct-q4_0.gguf",
     "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n%2<|im_end|>",
     "systemPrompt": "<|im_start|>system\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>\n"
+  },
+  {
+    "order": "za",
+    "md5sum": "fb84496bb990b6ab2b4f4d667c80fd13",
+    "name": "Phi-3-mini-128k-instruct",
+    "filename": "Phi-3-mini-128k-instruct-Q4_0.gguf",
+    "filesize": "2176177120",
+    "requires": "3.1",
+    "ramrequired": "8",
+    "parameters": "3.8 billion",
+    "quant": "q4_0",
+    "type": "phi3",
+    "description": "<ul><li>Very fast responses</li><li>Instruction based model</li><li>Usage of LocalDocs (RAG): Yes</li><li>Supports context length of up to 131072</li><li>Trained and finetuned by Microsoft</li><li>License: <a href=\"https://opensource.org/license/mit\">MIT</a></li></ul>",
+    "url": "https://huggingface.co/GPT4All-Community/Phi-3-mini-128k-instruct/resolve/main/Phi-3-mini-128k-instruct-Q4_0.gguf",
+    "promptTemplate": "<|user|>\n%1<|end|>\n<|assistant|>\n%2<|end|>\n",
+    "systemPrompt": "<|system|>You are a helpful assistant.<|end|>\n"
   }
 ]