server: error handling #5776

z80maniac · 2024-02-28T18:38:26Z

This is a rough draft of a possible implementation of an error handling for server.

Example of an error returned by the server:

{
  "error": {
    "description": "llama_sampling_init: empty grammar",
    "id": "grammar.empty"
  }
}

It uses exceptions. The main pro is that stack unwinding is really convenient. The main con is that a developer must not forget to de-init everything if needed before throwing the exception.
The exception code is in a separate file to avoid circular dependencies. Currently it's header-only.
Current information in the exception: id - error type ID for API clients, description - text for humans. The description argument in the constructor is mandatory to force a developer to write some description for errors. Maybe an arbitrary json data can be saved too in a separate field for more detailed info, but there was no need for it as of now.
When an exception is created it logs its info to stderr. It may look weird, but saves a lot of typing. Maybe needs to be refactored.
Previously, server returned error 404 on errors but that prevented actual JSON to be returned. Instead it just returned a string File Not Found. Now the 404 error is removed. Something probably needs to be done here.
Throwing an exception means that the task will be left in an undefined state, so these exceptions are meant to be thrown in cases that otherwise will crash the program. They should be returned to the user ASAP, ideally without any other interactions with the task object data.

While these exceptions will also affect main and other programs, it shouldn't change much in terms of backwards compatibility, because these programs will just crash as usual, but with a different exception. An example of main crashing:

❯ ./main -m /opt/models/text/llama-2-13b-chat.Q4_K_M.gguf --grammar " "
...
ERROR [grammar.empty]: llama_sampling_init: empty grammar
terminate called after throwing an instance of 'llama_error'
  what():  std::exception
fish: Job 1, './main -m /opt/models/text/llam…' terminated by signal SIGABRT (Abort)

Currently the following error situations are handled (chosen to represent various places where the error can occur):

Empty grammar (error outside the server code)

curl -sS --data '{"prompt":"Hello", "grammar":" "}' http://127.0.0.1:8080/completion | jq

{
  "error": {
    "description": "llama_sampling_init: empty grammar",
    "id": "grammar.empty"
  }
}

Invalid grammar (error outside the server code)

curl -sS --data '{"prompt":"Hello", "grammar":"("}' http://127.0.0.1:8080/completion | jq

{
  "error": {
    "description": "parse: error parsing grammar: expecting name at (",
    "id": "grammar.invalid"
  }
}

Empty prompt (error while parsing the params)

curl -Ss --data '{}' http://127.0.0.1:8080/completion | jq

{
  "error": {
    "description": "The prompt must not be empty",
    "id": "prompt.empty"
  }
}

Invalid request JSON (error before starting a task)

curl -sS --data '{' http://127.0.0.1:8080/completion | jq

{
  "error": {
    "description": "Invalid JSON: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing object key - unexpected end of input; expected string literal",
    "id": "request.invalid_json"
  }
}

Regarding the "empty prompt" case. I know that it was "fixed" in #5733, but IMHO that was not a proper fix. If an API client passes an empty string, the server should not assume that the client actually wanted a space character. API client can pass this character by itself if needed.

ngxson · 2024-03-08T10:21:32Z

Can you migrate this change to latest master? As mentioned in #5882 (comment) , we may want the error response to be compatible with OpenAI format. Could you work on that? Thanks.

ngxson · 2024-03-08T10:40:33Z

If an API client passes an empty string, the server should not assume that the client actually wanted a space character. API client can pass this character by itself if needed.

FYI, I don't want to return an error in this case to match OpenAI behavior: it still returns embedding even for empty prompt.

This problem is fixed in the refactor PR, and the server now can return embedding even with empty prompt (no need to add space, because BOS token is now added).

z80maniac · 2024-03-08T17:41:02Z

Can you migrate this change to latest master?

Done.

we may want the error response to be compatible with OpenAI format

Changed id/description to type/message.

and the server now can return embedding even with empty prompt

Removed the error on empty prompt. However, my fix was mostly for /completion and it still does not work for empty prompt:

❯ curl -sS --data '{"n_predict":10, "prompt":""}' http://127.0.0.1:8080/completion | jq .content
""

I requested 10 tokens and it returned nothing. Does it mean that for /completion that error is still needed?

ngxson · 2024-03-08T22:19:34Z

I requested 10 tokens and it returned nothing. Does it mean that for /completion that error is still needed?

Yeah seem like BOS is not added for empty prompt. It's more likely be a bug with detokenize function.

ggerganov · 2024-03-09T09:47:25Z

Should be fixed in: #5953

ggerganov

It uses exceptions. The main pro is that stack unwinding is really convenient. The main con is that a developer must not forget to de-init everything if needed before throwing the exception.

I prefer to not use exceptions, so introducing llama_error is not OK. For handling errors, the functions should return error codes or bool

z80maniac · 2024-03-09T10:17:40Z

introducing llama_error is not OK

I see. So this whole PR does not make any sense then.

ngxson · 2024-03-09T10:35:14Z

@z80maniac don't worry I'll propose an approach on one of my next PR. However, I'll need your help because it's quite difficult for me to work on the frontend code. The frontend changes you introduced in this PR is still usable though.

z80maniac · 2024-03-09T11:14:45Z

I don't really care about web UI, but I think that the error handling there requires major changes. In my PR I just kind of hacked the solution so it does not crash at least, but the whole error handling in the web UI is really weird, IMHO. The error is passed as a string, not valid JSON, then JSON is kind of awkwardly extracted from it, but the real JSON is actually inside that JSON, so it gets decoded again... And we end up with this:

result.error = JSON.parse(JSON.parse(result.error).content).error;

As I said I don't really care about web UI, and don't even use it, so I just left this all this logic intact (just wrote it differently), but maybe something needs to be done here. Or maybe it all works as intended, I don't know. Didn't really look too deep into this.

server: error handling

bcb60f3

z80maniac mentioned this pull request Feb 28, 2024

server : improvements and maintenance #4216

Open

10 tasks

z80maniac added 3 commits March 8, 2024 20:13

Merge remote-tracking branch 'upstream/master' into errors

fb83863

server: error handling: fixes after merge

8a8aaee

server: error handling: rename error fields to match OpenAI API

444f32e

z80maniac added 2 commits March 9, 2024 12:01

server: error handling: fix double free of ctx_sampling

a4f25f8

server: error handling: convert all unhandled exceptions to API errors

bd5cca1

ggerganov mentioned this pull request Mar 9, 2024

server : simplify logic for empty prompts #5953

Merged

ggerganov requested changes Mar 9, 2024

View reviewed changes

z80maniac added 2 commits March 9, 2024 13:02

server: error handling: always use new format for API errors

7809dbc

Merge remote-tracking branch 'upstream/master' into errors

a5b6086

z80maniac closed this Mar 9, 2024

ngxson mentioned this pull request Mar 9, 2024

Server: format error to json #5961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: error handling #5776

server: error handling #5776

z80maniac commented Feb 28, 2024

ngxson commented Mar 8, 2024

ngxson commented Mar 8, 2024 •

edited

Loading

z80maniac commented Mar 8, 2024

ngxson commented Mar 8, 2024 •

edited

Loading

ggerganov commented Mar 9, 2024

ggerganov left a comment

z80maniac commented Mar 9, 2024

ngxson commented Mar 9, 2024

z80maniac commented Mar 9, 2024

server: error handling #5776

server: error handling #5776

Conversation

z80maniac commented Feb 28, 2024

ngxson commented Mar 8, 2024

ngxson commented Mar 8, 2024 • edited Loading

z80maniac commented Mar 8, 2024

ngxson commented Mar 8, 2024 • edited Loading

ggerganov commented Mar 9, 2024

ggerganov left a comment

Choose a reason for hiding this comment

z80maniac commented Mar 9, 2024

ngxson commented Mar 9, 2024

z80maniac commented Mar 9, 2024

ngxson commented Mar 8, 2024 •

edited

Loading

ngxson commented Mar 8, 2024 •

edited

Loading