From 0930736f1553ba5d9dac27ef7feac71d0ce23c9e Mon Sep 17 00:00:00 2001 From: thxCode Date: Thu, 17 Oct 2024 09:59:42 +0800 Subject: [PATCH] docs: readme Signed-off-by: thxCode --- README.md | 144 +++++++++++++++++++++++++++--------------------------- 1 file changed, 72 insertions(+), 72 deletions(-) diff --git a/README.md b/README.md index 26c4587..a3fd719 100644 --- a/README.md +++ b/README.md @@ -107,15 +107,15 @@ $ gguf-parser --path="~/.cache/lm-studio/models/NousResearch/Hermes-2-Pro-Mistra | llama | 450.50 KiB | 32032 | N/A | 1 | 32000 | N/A | N/A | N/A | N/A | N/A | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+-------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+--------+-----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ -| llama | 32768 | 2048 / 512 | Disabled | Enabled | No | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 168.25 MiB | 318.25 MiB | 32 + 1 | 4 GiB | 11.16 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ ++-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+-------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+--------+-----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ +| llama | 32768 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 168.25 MiB | 318.25 MiB | 32 + 1 | 4 GiB | 11.16 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ $ # Retrieve the model's metadata via split file, $ # which needs all split files has been downloaded. @@ -145,15 +145,15 @@ $ gguf-parser --path="~/.cache/lm-studio/models/Qwen/Qwen2-72B-Instruct-GGUF/qwe | gpt2 | 2.47 MiB | 152064 | N/A | 151643 | 151645 | N/A | N/A | N/A | N/A | 151643 | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+-------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+--------+-----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ -| qwen2 | 32768 | 2048 / 512 | Disabled | Enabled | No | Unsupported | 81 (80 + 1) | Yes | 1 + 0 + 0 | 291.38 MiB | 441.38 MiB | 80 + 1 | 10 GiB | 73.47 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ ++-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+-------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+--------+-----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ +| qwen2 | 32768 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Unsupported | 81 (80 + 1) | Yes | 1 + 0 + 0 | 291.38 MiB | 441.38 MiB | 80 + 1 | 10 GiB | 73.47 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ ``` @@ -185,15 +185,15 @@ $ gguf-parser --url="https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8 | llama | 449.91 KiB | 32002 | N/A | 1 | 32000 | N/A | N/A | 0 | N/A | 2 | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-------------+----------------+---------------+----------------+----------------+----------------------------------------------+----------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+-----------+-----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-------------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ -| llama | 32768 | 2048 / 512 | Disabled | Unsupported | No | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 269.10 MiB | 419.10 MiB | 32 + 1 | 24.94 GiB | 27.41 GiB | -+-------+--------------+--------------------+-----------------+-------------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ ++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-------------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+----------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+-----------+-----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-------------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ +| llama | 32768 | 2048 / 512 | Disabled | Unsupported | No | Unsupported | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 269.10 MiB | 419.10 MiB | 32 + 1 | 24.94 GiB | 27.41 GiB | ++-------+--------------+--------------------+-----------------+-------------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ $ # Retrieve the model's metadata via split file @@ -222,15 +222,15 @@ $ gguf-parser --url="https://huggingface.co/MaziyarPanahi/Meta-Llama-3.1-405B-In | gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+---------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+---------+------------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+---------+------------+ -| llama | 131072 | 2048 / 512 | Disabled | Enabled | No | Supported | 127 (126 + 1) | Yes | 1 + 0 + 0 | 652.53 MiB | 802.53 MiB | 126 + 1 | 126 GiB | 299.79 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+---------+------------+ ++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+---------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+---------+------------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+---------+------------+ +| llama | 131072 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Supported | 127 (126 + 1) | Yes | 1 + 0 + 0 | 652.53 MiB | 802.53 MiB | 126 + 1 | 126 GiB | 299.79 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+---------+------------+ ``` @@ -262,15 +262,15 @@ $ gguf-parser --hf-repo="openbmb/MiniCPM-Llama3-V-2_5-gguf" --hf-file="ggml-mode | gpt2 | 2 MiB | 128256 | N/A | 128000 | 128001 | N/A | N/A | 128002 | N/A | 0 | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+--------+----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+----------+ -| llama | 8192 | 2048 / 512 | Disabled | Enabled | No | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 176.85 MiB | 326.85 MiB | 32 + 1 | 1 GiB | 7.78 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+----------+ ++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+--------+----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+----------+ +| llama | 8192 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 176.85 MiB | 326.85 MiB | 32 + 1 | 1 GiB | 7.78 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+----------+ $ # Retrieve the model's metadata via split file @@ -339,15 +339,15 @@ $ gguf-parser --ms-repo="shaowenchen/chinese-alpaca-2-13b-16k-gguf" --ms-file="c | llama | 769.83 KiB | 55296 | N/A | 1 | 2 | N/A | N/A | N/A | N/A | N/A | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+----------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+-----------+-----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ -| llama | 16384 | 2048 / 512 | Disabled | Enabled | No | Supported | 41 (40 + 1) | Yes | 1 + 0 + 0 | 144.95 MiB | 294.95 MiB | 40 + 1 | 12.50 GiB | 22.96 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ ++--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+----------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+-----------+-----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ +| llama | 16384 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Supported | 41 (40 + 1) | Yes | 1 + 0 + 0 | 144.95 MiB | 294.95 MiB | 40 + 1 | 12.50 GiB | 22.96 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+-----------+-----------+ ``` @@ -379,15 +379,15 @@ $ gguf-parser --ol-model="llama3.1" | gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+-------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+--------+-----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ -| llama | 131072 | 2048 / 512 | Disabled | Enabled | No | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 403.62 MiB | 553.62 MiB | 32 + 1 | 16 GiB | 29.08 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ ++-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+-------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+--------+-----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ +| llama | 131072 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 403.62 MiB | 553.62 MiB | 32 + 1 | 16 GiB | 29.08 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+--------+-----------+ $ # Ollama Model includes the preset params and other artifacts, like multimodal projectors or LoRA adapters, $ # you can get the usage of Ollama running by using `--ol-usage` option. @@ -417,15 +417,15 @@ $ gguf-parser --ol-model="llama3.1" --ol-usage | gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+----------------------------------------------+----------------------------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | | +--------------------+------------+------------+----------------+------------+----------+ -| | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+------------+----------+ -| llama | 2048 | 2048 / 512 | Disabled | Enabled | No | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 151.62 MiB | 301.62 MiB | 32 + 1 | 256.50 MiB | 4.82 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+------------+----------+ ++--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+----------------------------------------------+----------------------------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | RERANKING | DISTRIBUTABLE | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | | | +--------------------+------------+------------+----------------+------------+----------+ +| | | | | | | | | | | LAYERS (I + T + O) | UMA | NONUMA | LAYERS (T + O) | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+------------+----------+ +| llama | 2048 | 2048 / 512 | Disabled | Enabled | No | Unsupported | Supported | 33 (32 + 1) | Yes | 1 + 0 + 0 | 151.62 MiB | 301.62 MiB | 32 + 1 | 256.50 MiB | 4.81 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+-------------+---------------+----------------+----------------+--------------------+------------+------------+----------------+------------+----------+ ```