Add llmaz as another platform to run llama.cpp on Kubernetes #9096

kerthcet · 2024-08-20T02:49:04Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Hi, llmaz is a platform to serve large language models on Kubernetes, llama.cpp is an vital part of it for CPU inference. Here's an example:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0-5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      filename: qwen2-0_5b-instruct-q5_k_m.gguf

apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: qwen2-0-5b
spec:
  replicas: 1
  modelClaim:
    modelName: qwen2-0-5b-gguf
  backendConfig:
    name: llamacpp
    args:
    - -fa # use flash attention

This is all your need to do, then you can serve models with llama.cpp in Kubernetes, we also support multi-host inference, we'll try to integrate with llama.cpp in the near future. If you're interested, here's an example https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/llamacpp provided by the lws community.

Thanks!

Signed-off-by: kerthcet <[email protected]>

kerthcet · 2024-08-20T02:53:29Z

kindly ping @ggerganov

Add llmaz as another platform to run llama.cpp on Kubernetes

7323304

Signed-off-by: kerthcet <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

kerthcet commented Aug 20, 2024 •

edited

Loading

kerthcet commented Aug 20, 2024

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Are you sure you want to change the base?

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Conversation

kerthcet commented Aug 20, 2024 • edited Loading

kerthcet commented Aug 20, 2024

kerthcet commented Aug 20, 2024 •

edited

Loading