Adding Prometheus metrics to the server #314

spirilis · 2023-06-04T19:07:46Z

spirilis
Jun 4, 2023

Hi folks- Love this project and have it generally working in docker (built locally), I've got some ideas for giving it more operational durability (issue I posted previously about stopping the model if a client disconnects, which looks like it needs llama.cpp support first), but one thing I'm a fan of is prometheus metrics - particularly when running servers inside a Kubernetes environment. You can leverage them for custom horizontal-pod-autoscaler rules to address scalability and multi-user load.

I'm thinking about things like "is there an actively running model query", "how many requests have we fielded", "how many completions vs embeddings etc".

I have enough experience with Prometheus itself to write the code for this myself but I'm looking for guidance for how the application flow works in the API server. I take it this is based on FastAPI, although my python is a bit rough and I've never implemented server backends in it before. Where would you recommend someone look first for adding "metrics hooks" into the API request code?

bufferoverflow · 2024-02-01T08:12:30Z

bufferoverflow
Feb 1, 2024

yes, that would be cool. similar to https://docs.vllm.ai/en/latest/serving/metrics.html

0 replies

vriesdemichael · 2024-02-23T17:43:38Z

vriesdemichael
Feb 23, 2024

For use in production the bigger hurdle to take is continuous batched inference. That should allow multi user usage.

At this point Prometheus could give some insight into performance on the hardware you're running it, but you might as well just scale based on cpu utilization instead of more detailed usage stats.

If llama-cpp-python reaches the point where we can use it for on premise inference reliably I'll gladly look into sensible Prometheus statistics to report so we can use it in KEDA.

0 replies

hmtrii · 2024-10-10T02:46:59Z

hmtrii
Oct 10, 2024

Any update?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Prometheus metrics to the server #314

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Adding Prometheus metrics to the server #314

spirilis Jun 4, 2023

Replies: 3 comments

bufferoverflow Feb 1, 2024

vriesdemichael Feb 23, 2024

hmtrii Oct 10, 2024

spirilis
Jun 4, 2023

bufferoverflow
Feb 1, 2024

vriesdemichael
Feb 23, 2024

hmtrii
Oct 10, 2024