Add llmaz as another platform to run llama.cpp on Kubernetes #9096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, llmaz is a platform to serve large language models on Kubernetes, llama.cpp is an vital part of it for CPU inference. Here's an example:
This is all your need to do, then you can serve models with llama.cpp in Kubernetes, we also support multi-host inference, we'll try to integrate with llama.cpp in the near future. If you're interested, here's an example https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/llamacpp provided by the lws community.
Thanks!