Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : refactor middleware and /health endpoint #9056

Merged
merged 6 commits into from
Aug 16, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 16, 2024

/health endpoint

In the beginning, /health endpoint was used to retrieve slots state. That was because at the time, /completions endpoint returns an error if there is no slot available. Therefore, /health was used to allow the application to wait until one slot is available.

Nowadays, the server now can queue (defer) the request if no slots is available. /health is used by docker for health checking. This is now become a problem when the server is busy doing a long task, /health can timeout. On HF inference endpoint, this causes the container to be in unhealthy state, which triggers a force restart.

Therefore, I propose a cleaner usage:

  • GET /health is now purely used to report actual health
  • GET /slots can be used as a replacement to get slot state

As a consequence, /health?fail_on_no_slot=1 is also moved to /slots?fail_on_no_slot=1 (for compatibility, we keep this option)

Refactor middleware

Some repeated code blocks, for example setting Access-Control-Allow-Origin, is now moved to middleware.

Middleware now also responsible to return error if the server is not yet ready:

When the server starts, if the model is being loaded, accessing to any endpoint will result in 503 error code:

image

Behavior on loading model failed

If model fails to load (for example, file does not exist), the server will simply exit with status code 1. This resolves #7787 where user reports that loading invalid model causes the server to crash.

image


examples/server/server.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added the python python script changes label Aug 16, 2024
@ngxson ngxson merged commit 8b3befc into ggerganov:master Aug 16, 2024
53 checks passed
@mcharytoniuk
Copy link
Contributor

I started a discussion thread related to this issue, please take a look: #9276

@ngxson ngxson added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Sep 2, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* server : refactor middleware and /health endpoint

* move "fail_on_no_slot" to /slots

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* fix server tests

* fix CI

* update server docs

---------

Co-authored-by: Georgi Gerganov <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* server : refactor middleware and /health endpoint

* move "fail_on_no_slot" to /slots

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* fix server tests

* fix CI

* update server docs

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor: investigate cleaner exception handling for server/server.cpp
3 participants