-
Notifications
You must be signed in to change notification settings - Fork 116
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add open-webui, ramalama, llamacpp example
Signed-off-by: sallyom <[email protected]>
- Loading branch information
Showing
4 changed files
with
160 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
## Self-hosted AI Platform: Open-WebUI, Llamacpp model server, and Ramalama storage | ||
|
||
If you've checked out [open-webui](https://docs.openwebui.com/) chances are you got up and running quickly with `ollama`. `open-webui` has a ton of | ||
cool features like enabling web-search and RAG with chats. I love `open-webui` but I don't love all the extra baggage that comes with `ollama`. I | ||
prefer to use [ramalama](https://github.com/containers/ramalama) to manage local storage of models. Ramalama also incorporates llamacpp to serve | ||
models. However, in this example, I'm using `ramalama` for storage, and a standalone `llamacpp-python` model server built from | ||
[../../model_servers/llamacpp_python/base/Containerfile](../../model_servers/llamacpp_python/base/Containerfile). A simple `make build` from | ||
`../../model_servers/llamacpp_python` directory will build this image for you. You can also feel free to use the public | ||
`quay.io/sallyom/llamacpp-python:latest` image if you are running on `arm64` MacOS. I haven't pushed the image for `amd64` yet. | ||
|
||
### Pull models from huggingface using `ramalama` | ||
|
||
From root of this repository | ||
|
||
```bash | ||
cd ../ && git clone [email protected]:containers/ramalama && cd ramalama | ||
python -m venv venv | ||
source venv/bin/activate | ||
pip install ramalama | ||
# To view the short-names that ramalama knows about, see | ||
./venv/share/ramalama/short-names.conf | ||
|
||
ramalama pull granite-code # I've had i | ||
ramalama pull llama3 | ||
ramalama pull hermes | ||
# run `deactivate` to leave virtualenv | ||
# run `cp ./venv/share/ramalama/shortnames.conf ~/.config/ramalama/shortnames.conf` to use ramalama outside of virtualenv | ||
# run something like `sudo cp ./bin/ramalama /usr/local/bin/` to add ramalama to PATH | ||
|
||
cd ../ai-lab-recipes | ||
``` | ||
|
||
Create a podman volume with GGUF files | ||
|
||
```bash | ||
podman volume create --opt type=none --opt o=bind,ro --opt device=/abs/path/to/.local/share/ramalama ramalama-gguf | ||
``` | ||
|
||
To check out this volume, you can | ||
|
||
```bash | ||
podman volume list | ||
podman volume inspect ramalama-gguf | ||
``` | ||
|
||
Update [open-webui-llamacpp.yaml](./open-webui-llamacpp.yaml) at the `TODO: #L53-#L57` to specify the hostPath location of your [model-config](./model-config). | ||
Update the model-config to point to the locations of the gguf files in your ramalama storage. If following this exactly, you shouldn't need any updates. | ||
Llamacpp server uses this config file to find models. More information about this feature | ||
[here](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support) | ||
|
||
|
||
Then, run the pod locally with podman. | ||
|
||
```bash | ||
podman kube play platforms/ramalama-open-webui/open-webui-llamacpp.yaml | ||
|
||
# to see logs, check status, etc | ||
podman pod list | ||
podman ps | ||
podman logs [openwebui container] | ||
podman logs [llamacpp container] | ||
``` | ||
|
||
Access `open-webui` at `http://localhost:3000` | ||
|
||
Access llamacpp server at `http://localhost:9999/v1/models` to see the list of models. | ||
|
||
You can now interact with the open-webui features. Note I've disabled the ollama API, so it's only using the generic openai API. I do not have ollama | ||
running locally, and I don't need it. If I want to download any other models, I can use `ramalama pull` and then update the `model-config` file that | ||
llamacpp-python server uses to locate the models. | ||
|
||
Now check out the [open-webui documentation](https://docs.openwebui.com/) to start playing with your full-featured local AI platform! | ||
The beefier your local system is, the more fun it will be. For me, I have a Mac M2 so I experience some limitations pretty quickly when using | ||
open-webui's various features. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{ | ||
"host": "0.0.0.0", | ||
"port": 8080, | ||
"models": [ | ||
{ | ||
"model": "/ramalama/models/huggingface/bartowski/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf", | ||
"model_alias": "llama3", | ||
"n_gpu_layers": -1, | ||
"n_ctx": 2048 | ||
}, | ||
{ | ||
"model": "/ramalama/models/huggingface/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf", | ||
"model_alias": "hermes", | ||
"n_gpu_layers": -1, | ||
"n_ctx": 2048 | ||
}, | ||
{ | ||
"model": "/ramalama/models/huggingface/ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf", | ||
"model_alias": "granite-code", | ||
"n_gpu_layers": -1, | ||
"n_ctx": 2048 | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: webui-llamacpp-pod | ||
spec: | ||
containers: | ||
- name: open-webui | ||
image: ghcr.io/open-webui/open-webui:main | ||
ports: | ||
- containerPort: 8001 | ||
hostPort: 3000 | ||
env: | ||
- name: ENABLE_OLLAMA_API | ||
value: "false" | ||
- name: WEBUI_AUTH | ||
value: "false" | ||
- name: PORT | ||
value: "8001" | ||
- name: OPENAI_API_BASE_URL | ||
value: "http://0.0.0.0:8080/v1" | ||
- name: OPENAI_API_KEY | ||
value: "empty" | ||
volumeMounts: | ||
- name: open-webui-data | ||
mountPath: /app/backend/data | ||
- name: ramalama-gguf | ||
mountPath: /ramalama | ||
- name: model-config | ||
mountPath: /configs/model-config | ||
- name: llamacpp-python | ||
image: quay.io/sallyom/llamacpp-python:latest | ||
ports: | ||
- containerPort: 8080 | ||
hostPort: 9999 | ||
env: | ||
- name: CONFIG_PATH | ||
value: /configs/model-config | ||
- name: CONFIG_FILE | ||
value: /configs/model-config | ||
volumeMounts: | ||
- name: ramalama-gguf | ||
mountPath: /ramalama | ||
- name: model-config | ||
mountPath: /configs/model-config | ||
volumes: | ||
- name: open-webui-data | ||
persistentVolumeClaim: | ||
claimName: open-webui | ||
- name: ramalama-gguf | ||
persistentVolumeClaim: | ||
claimName: ramalama-gguf | ||
- name: model-config | ||
# TODO: UPDATE THIS TO MATCH YOUR FILESYSTEM | ||
# model-config example is in this folder at ./model-config | ||
hostPath: | ||
path: /Users/somalley/git/containers/ai-lab-recipes/platforms/ramalama-open-webui/model-config | ||
type: File |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[shortnames] | ||
"llama3" = "huggingface://bartowski/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf" | ||
"granite-code" = "huggingface://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf" | ||
"tinyllama" = "huggingface://TinyLlama/TinyLlama-1.1B-Chat-v0.2-GGUF/ggml-model-q4_0.gguf" | ||
"hermes" = "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf" |