Skip to content

Commit

Permalink
add open-webui, ramalama, llamacpp example
Browse files Browse the repository at this point in the history
Signed-off-by: sallyom <[email protected]>
  • Loading branch information
sallyom committed Jan 20, 2025
1 parent 11fc246 commit 9f909ae
Show file tree
Hide file tree
Showing 4 changed files with 171 additions and 0 deletions.
84 changes: 84 additions & 0 deletions platforms/ramalama-open-webui/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
## Self-hosted AI Platform: Open-WebUI, Llamacpp model server, and Ramalama storage

If you've checked out [open-webui](https://docs.openwebui.com/) chances are you got up and running quickly with `ollama`. `open-webui` has a ton of
cool features like enabling web-search and RAG with chats. I love `open-webui` but I don't love all the extra baggage that comes with `ollama`. I
prefer to use [ramalama](https://github.com/containers/ramalama) to manage local storage of models. Ramalama also incorporates llamacpp to serve
models. However, in this example, I'm using `ramalama` for storage, and a standalone `llamacpp-python` model server built from
[../../model_servers/llamacpp_python/base/Containerfile](../../model_servers/llamacpp_python/base/Containerfile). A simple `make build` from
`../../model_servers/llamacpp_python` directory will build this image for you. You can also feel free to use the public
`quay.io/sallyom/llamacpp-python:latest` image if you are running on `arm64` MacOS. I haven't pushed the image for `amd64` yet.

### Pull models from huggingface using `ramalama`

#### Install ramalama

From root of this repository

```bash
cd ../ && git clone [email protected]:containers/ramalama && cd ramalama
python -m venv venv
source venv/bin/activate
pip install ramalama
# To view the short names that ramalama knows about, see
./venv/share/ramalama/shortnames.conf

# run `deactivate` to leave virtualenv
# run something like `sudo cp ./bin/ramalama /usr/local/bin/` to add ramalama to PATH
```

#### Pull models with ramalama

It's assumed you've added `./bin/ramalama` to your `$PATH`.
Included in this folder is a short list of short names used in this example. See [./shortnames.conf](./shortnames.conf).

```bash
cp ./shortnames.conf ~/.config/ramalama/shortnames.conf
ramalama pull llama3
ramalama pull hermes
ramalama pull granite-code # I've had issues with this model, might need to use a different version

cd ../ai-lab-recipes
```

Create a podman volume with GGUF files

```bash
podman volume create --opt type=none --opt o=bind,ro --opt device=/abs/path/to/.local/share/ramalama ramalama-gguf
```

To check out this volume, you can

```bash
podman volume list
podman volume inspect ramalama-gguf
```

Update [open-webui-llamacpp.yaml](./open-webui-llamacpp.yaml) at the `TODO: #L53-#L57` to specify the hostPath location of your [model-config](./model-config).
Update the model-config to point to the locations of the gguf files in your ramalama storage. If following this exactly, you shouldn't need any updates.
Llamacpp server uses this config file to find models. More information about this feature
[here](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support)


Then, run the pod locally with podman.

```bash
podman kube play platforms/ramalama-open-webui/open-webui-llamacpp.yaml

# to see logs, check status, etc
podman pod list
podman ps
podman logs [openwebui container]
podman logs [llamacpp container]
```

Access `open-webui` at `http://localhost:3000`

Access llamacpp server at `http://localhost:9999/v1/models` to see the list of models.

You can now interact with the open-webui features. Note I've disabled the ollama API, so it's only using the generic openai API. I do not have ollama
running locally, and I don't need it. If I want to download any other models, I can use `ramalama pull` and then update the `model-config` file that
llamacpp-python server uses to locate the models.

Now check out the [open-webui documentation](https://docs.openwebui.com/) to start playing with your full-featured local AI platform!
The beefier your local system is, the more fun it will be. For me, I have a Mac M2 so I experience some limitations pretty quickly when using
open-webui's various features.
24 changes: 24 additions & 0 deletions platforms/ramalama-open-webui/model-config
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"host": "0.0.0.0",
"port": 8080,
"models": [
{
"model": "/ramalama/models/huggingface/bartowski/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf",
"model_alias": "llama3",
"n_gpu_layers": -1,
"n_ctx": 2048
},
{
"model": "/ramalama/models/huggingface/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"model_alias": "hermes",
"n_gpu_layers": -1,
"n_ctx": 2048
},
{
"model": "/ramalama/models/huggingface/ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"model_alias": "granite-code",
"n_gpu_layers": -1,
"n_ctx": 2048
}
]
}
57 changes: 57 additions & 0 deletions platforms/ramalama-open-webui/open-webui-llamacpp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
apiVersion: v1
kind: Pod
metadata:
name: webui-llamacpp-pod
spec:
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:main
ports:
- containerPort: 8001
hostPort: 3000
env:
- name: ENABLE_OLLAMA_API
value: "false"
- name: WEBUI_AUTH
value: "false"
- name: PORT
value: "8001"
- name: OPENAI_API_BASE_URL
value: "http://0.0.0.0:8080/v1"
- name: OPENAI_API_KEY
value: "empty"
volumeMounts:
- name: open-webui-data
mountPath: /app/backend/data
- name: ramalama-gguf
mountPath: /ramalama
- name: model-config
mountPath: /configs/model-config
- name: llamacpp-python
image: quay.io/sallyom/llamacpp-python:latest
ports:
- containerPort: 8080
hostPort: 9999
env:
- name: CONFIG_PATH
value: /configs/model-config
- name: CONFIG_FILE
value: /configs/model-config
volumeMounts:
- name: ramalama-gguf
mountPath: /ramalama
- name: model-config
mountPath: /configs/model-config
volumes:
- name: open-webui-data
persistentVolumeClaim:
claimName: open-webui
- name: ramalama-gguf
persistentVolumeClaim:
claimName: ramalama-gguf
- name: model-config
# TODO: UPDATE THIS TO MATCH YOUR FILESYSTEM
# model-config example is in this folder at ./model-config
hostPath:
path: /Users/somalley/git/containers/ai-lab-recipes/platforms/ramalama-open-webui/model-config
type: File
6 changes: 6 additions & 0 deletions platforms/ramalama-open-webui/shortnames.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[shortnames]
"llama3" = "huggingface://bartowski/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf"
"tinyllama" = "huggingface://TinyLlama/TinyLlama-1.1B-Chat-v0.2-GGUF/ggml-model-q4_0.gguf"
"hermes" = "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf"
"granite-code" = "huggingface://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf"

0 comments on commit 9f909ae

Please sign in to comment.