Skip to content

Commit

Permalink
remove inaccurate/unnecessary info
Browse files Browse the repository at this point in the history
Signed-off-by: fria <[email protected]>
  • Loading branch information
friadev authored Nov 14, 2024
1 parent 7e3fcd9 commit e61002d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/ai-chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Alternatively, you can run AI models locally so that your data never leaves your

Local models are also fairly accessible. It's possible to run smaller models at lower speeds on as little as 8GB of RAM. Using more powerful hardware such as a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory will offer the best experience.

LLMs can usually be differentiated by the number of parameters, which can vary between 1.3B to 405B. The higher the number of parameters, the higher the LLM's capabilities. For example, models below 6.7B parameters are only good for basic tasks like text summaries, while models between 7B and 13B are a great compromise between quality and speed. Models with advanced reasoning capabilities are generally around 70B.
LLMs can usually be differentiated by the number of parameters. The higher the number of parameters, the higher the LLM's capabilities. For example, models below 6.7B parameters are only good for basic tasks like text summaries, while models between 7B and 13B are a great compromise between quality and speed. Models with advanced reasoning capabilities are generally around 70B.

For consumer-grade hardware, it is generally recommended to use [quantized models](https://huggingface.co/docs/optimum/en/concept_guides/quantization) for the best balance between model quality and performance. Check out the table below for more precise information about the typical requirements for different sizes of quantized models.

Expand Down

0 comments on commit e61002d

Please sign in to comment.