add open-webui, ramalama, llamacpp example

Signed-off-by: sallyom <[email protected]>
containers · Jan 20, 2025 · 9f909ae · 9f909ae
1 parent 11fc246
commit 9f909ae
Show file tree

Hide file tree

Showing 4 changed files with 171 additions and 0 deletions.
diff --git a/platforms/ramalama-open-webui/README.md b/platforms/ramalama-open-webui/README.md
@@ -0,0 +1,84 @@
+## Self-hosted AI Platform: Open-WebUI, Llamacpp model server, and Ramalama storage
+
+If you've checked out [open-webui](https://docs.openwebui.com/) chances are you got up and running quickly with `ollama`. `open-webui` has a ton of
+cool features like enabling web-search and RAG with chats. I love `open-webui` but I don't love all the extra baggage that comes with `ollama`. I
+prefer to use [ramalama](https://github.com/containers/ramalama) to manage local storage of models. Ramalama also incorporates llamacpp to serve
+models. However, in this example, I'm using `ramalama` for storage, and a standalone `llamacpp-python` model server built from
+[../../model_servers/llamacpp_python/base/Containerfile](../../model_servers/llamacpp_python/base/Containerfile). A simple `make build` from 
+`../../model_servers/llamacpp_python` directory will build this image for you. You can also feel free to use the public
+`quay.io/sallyom/llamacpp-python:latest` image if you are running on `arm64` MacOS. I haven't pushed the image for `amd64` yet.
+
+### Pull models from huggingface using `ramalama`
+
+#### Install ramalama
+
+From root of this repository
+
+```bash
+cd ../ && git clone [email protected]:containers/ramalama && cd ramalama
+python -m venv venv
+source venv/bin/activate
+pip install ramalama
+# To view the short names that ramalama knows about, see 
+./venv/share/ramalama/shortnames.conf
+
+# run `deactivate` to leave virtualenv
+# run something like `sudo cp ./bin/ramalama /usr/local/bin/` to add ramalama to PATH
+```
+
+#### Pull models with ramalama
+
+It's assumed you've added `./bin/ramalama` to your `$PATH`.
+Included in this folder is a short list of short names used in this example. See [./shortnames.conf](./shortnames.conf). 
+
+```bash
+cp ./shortnames.conf ~/.config/ramalama/shortnames.conf
+ramalama pull llama3
+ramalama pull hermes
+ramalama pull granite-code  # I've had issues with this model, might need to use a different version
+
+cd ../ai-lab-recipes
+```
+
+Create a podman volume with GGUF files
+
+```bash
+podman volume create  --opt type=none --opt o=bind,ro --opt device=/abs/path/to/.local/share/ramalama  ramalama-gguf
+```
+
+To check out this volume, you can
+
+```bash
+podman volume list 
+podman volume inspect ramalama-gguf
+```
+
+Update [open-webui-llamacpp.yaml](./open-webui-llamacpp.yaml) at the `TODO: #L53-#L57` to specify the hostPath location of your [model-config](./model-config).
+Update the model-config to point to the locations of the gguf files in your ramalama storage. If following this exactly, you shouldn't need any updates.
+Llamacpp server uses this config file to find models. More information about this feature
+[here](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support)
+
+
+Then, run the pod locally with podman.
+
+```bash
+podman kube play platforms/ramalama-open-webui/open-webui-llamacpp.yaml
+
+# to see logs, check status, etc
+podman pod list
+podman ps 
+podman logs [openwebui container]
+podman logs [llamacpp container]
+```
+
+Access `open-webui` at `http://localhost:3000`
+
+Access llamacpp server at `http://localhost:9999/v1/models` to see the list of models.
+
+You can now interact with the open-webui features. Note I've disabled the ollama API, so it's only using the generic openai API. I do not have ollama
+running locally, and I don't need it. If I want to download any other models, I can use `ramalama pull` and then update the `model-config` file that
+llamacpp-python server uses to locate the models.
+
+Now check out the [open-webui documentation](https://docs.openwebui.com/) to start playing with your full-featured local AI platform! 
+The beefier your local system is, the more fun it will be. For me, I have a Mac M2 so I experience some limitations pretty quickly when using
+open-webui's various features.
diff --git a/platforms/ramalama-open-webui/model-config b/platforms/ramalama-open-webui/model-config
@@ -0,0 +1,24 @@
+{
+    "host": "0.0.0.0",
+    "port": 8080,
+    "models": [
+        {
+            "model": "/ramalama/models/huggingface/bartowski/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf",
+            "model_alias": "llama3",
+            "n_gpu_layers": -1,
+            "n_ctx": 2048
+        },
+        {
+            "model": "/ramalama/models/huggingface/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
+            "model_alias": "hermes",
+            "n_gpu_layers": -1,
+            "n_ctx": 2048
+        },
+        {
+            "model": "/ramalama/models/huggingface/ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
+            "model_alias": "granite-code",
+            "n_gpu_layers": -1,
+            "n_ctx": 2048
+        }
+    ]
+}
diff --git a/platforms/ramalama-open-webui/open-webui-llamacpp.yaml b/platforms/ramalama-open-webui/open-webui-llamacpp.yaml
@@ -0,0 +1,57 @@
+apiVersion: v1
+kind: Pod
+metadata:
+  name: webui-llamacpp-pod
+spec:
+  containers:
+    - name: open-webui
+      image: ghcr.io/open-webui/open-webui:main
+      ports:
+        - containerPort: 8001
+          hostPort: 3000
+      env:
+        - name: ENABLE_OLLAMA_API
+          value: "false"
+        - name: WEBUI_AUTH
+          value: "false"
+        - name: PORT
+          value: "8001"
+        - name: OPENAI_API_BASE_URL
+          value: "http://0.0.0.0:8080/v1"
+        - name: OPENAI_API_KEY
+          value: "empty"
+      volumeMounts:
+        - name: open-webui-data
+          mountPath: /app/backend/data
+        - name: ramalama-gguf
+          mountPath: /ramalama
+        - name: model-config
+          mountPath: /configs/model-config
+    - name: llamacpp-python
+      image: quay.io/sallyom/llamacpp-python:latest
+      ports:
+        - containerPort: 8080
+          hostPort: 9999
+      env:
+        - name: CONFIG_PATH
+          value: /configs/model-config
+        - name: CONFIG_FILE
+          value: /configs/model-config
+      volumeMounts:
+        - name: ramalama-gguf
+          mountPath: /ramalama
+        - name: model-config
+          mountPath: /configs/model-config
+  volumes:
+    - name: open-webui-data
+      persistentVolumeClaim:
+        claimName: open-webui
+    - name: ramalama-gguf
+      persistentVolumeClaim:
+        claimName: ramalama-gguf
+    - name: model-config
+    # TODO: UPDATE THIS TO MATCH YOUR FILESYSTEM
+    # model-config example is in this folder at ./model-config
+      hostPath:
+        path: /Users/somalley/git/containers/ai-lab-recipes/platforms/ramalama-open-webui/model-config
+        type: File
diff --git a/platforms/ramalama-open-webui/shortnames.conf b/platforms/ramalama-open-webui/shortnames.conf
@@ -0,0 +1,6 @@
+[shortnames]
+  "llama3" = "huggingface://bartowski/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf"
+  "tinyllama" = "huggingface://TinyLlama/TinyLlama-1.1B-Chat-v0.2-GGUF/ggml-model-q4_0.gguf"
+  "hermes" = "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf"
+  "granite-code" = "huggingface://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf"
+