Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: error handling #5776

Closed
wants to merge 8 commits into from
Closed

server: error handling #5776

wants to merge 8 commits into from

Conversation

z80maniac
Copy link
Contributor

This is a rough draft of a possible implementation of an error handling for server.

Example of an error returned by the server:

{
  "error": {
    "description": "llama_sampling_init: empty grammar",
    "id": "grammar.empty"
  }
}
  • It uses exceptions. The main pro is that stack unwinding is really convenient. The main con is that a developer must not forget to de-init everything if needed before throwing the exception.
  • The exception code is in a separate file to avoid circular dependencies. Currently it's header-only.
  • Current information in the exception: id - error type ID for API clients, description - text for humans. The description argument in the constructor is mandatory to force a developer to write some description for errors. Maybe an arbitrary json data can be saved too in a separate field for more detailed info, but there was no need for it as of now.
  • When an exception is created it logs its info to stderr. It may look weird, but saves a lot of typing. Maybe needs to be refactored.
  • Previously, server returned error 404 on errors but that prevented actual JSON to be returned. Instead it just returned a string File Not Found. Now the 404 error is removed. Something probably needs to be done here.
  • Throwing an exception means that the task will be left in an undefined state, so these exceptions are meant to be thrown in cases that otherwise will crash the program. They should be returned to the user ASAP, ideally without any other interactions with the task object data.

While these exceptions will also affect main and other programs, it shouldn't change much in terms of backwards compatibility, because these programs will just crash as usual, but with a different exception. An example of main crashing:

❯ ./main -m /opt/models/text/llama-2-13b-chat.Q4_K_M.gguf --grammar " "
...
ERROR [grammar.empty]: llama_sampling_init: empty grammar
terminate called after throwing an instance of 'llama_error'
  what():  std::exception
fish: Job 1, './main -m /opt/models/text/llam…' terminated by signal SIGABRT (Abort)

Currently the following error situations are handled (chosen to represent various places where the error can occur):

Empty grammar (error outside the server code)
curl -sS --data '{"prompt":"Hello", "grammar":" "}' http://127.0.0.1:8080/completion | jq
{
  "error": {
    "description": "llama_sampling_init: empty grammar",
    "id": "grammar.empty"
  }
}
Invalid grammar (error outside the server code)
curl -sS --data '{"prompt":"Hello", "grammar":"("}' http://127.0.0.1:8080/completion | jq
{
  "error": {
    "description": "parse: error parsing grammar: expecting name at (",
    "id": "grammar.invalid"
  }
}
Empty prompt (error while parsing the params)
curl -Ss --data '{}' http://127.0.0.1:8080/completion | jq
{
  "error": {
    "description": "The prompt must not be empty",
    "id": "prompt.empty"
  }
}
Invalid request JSON (error before starting a task)
curl -sS --data '{' http://127.0.0.1:8080/completion | jq
{
  "error": {
    "description": "Invalid JSON: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing object key - unexpected end of input; expected string literal",
    "id": "request.invalid_json"
  }
}

Regarding the "empty prompt" case. I know that it was "fixed" in #5733, but IMHO that was not a proper fix. If an API client passes an empty string, the server should not assume that the client actually wanted a space character. API client can pass this character by itself if needed.

@ngxson
Copy link
Collaborator

ngxson commented Mar 8, 2024

Can you migrate this change to latest master? As mentioned in #5882 (comment) , we may want the error response to be compatible with OpenAI format. Could you work on that? Thanks.

@ngxson
Copy link
Collaborator

ngxson commented Mar 8, 2024

If an API client passes an empty string, the server should not assume that the client actually wanted a space character. API client can pass this character by itself if needed.

FYI, I don't want to return an error in this case to match OpenAI behavior: it still returns embedding even for empty prompt.

This problem is fixed in the refactor PR, and the server now can return embedding even with empty prompt (no need to add space, because BOS token is now added).

@z80maniac
Copy link
Contributor Author

Can you migrate this change to latest master?

Done.

we may want the error response to be compatible with OpenAI format

Changed id/description to type/message.

and the server now can return embedding even with empty prompt

Removed the error on empty prompt. However, my fix was mostly for /completion and it still does not work for empty prompt:

❯ curl -sS --data '{"n_predict":10, "prompt":""}' http://127.0.0.1:8080/completion | jq .content
""

I requested 10 tokens and it returned nothing. Does it mean that for /completion that error is still needed?

@ngxson
Copy link
Collaborator

ngxson commented Mar 8, 2024

I requested 10 tokens and it returned nothing. Does it mean that for /completion that error is still needed?

Yeah seem like BOS is not added for empty prompt. It's more likely be a bug with detokenize function.

@ggerganov
Copy link
Owner

Should be fixed in: #5953

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses exceptions. The main pro is that stack unwinding is really convenient. The main con is that a developer must not forget to de-init everything if needed before throwing the exception.

I prefer to not use exceptions, so introducing llama_error is not OK. For handling errors, the functions should return error codes or bool

@z80maniac
Copy link
Contributor Author

introducing llama_error is not OK

I see. So this whole PR does not make any sense then.

@z80maniac z80maniac closed this Mar 9, 2024
@ngxson
Copy link
Collaborator

ngxson commented Mar 9, 2024

@z80maniac don't worry I'll propose an approach on one of my next PR. However, I'll need your help because it's quite difficult for me to work on the frontend code. The frontend changes you introduced in this PR is still usable though.

@z80maniac
Copy link
Contributor Author

I don't really care about web UI, but I think that the error handling there requires major changes. In my PR I just kind of hacked the solution so it does not crash at least, but the whole error handling in the web UI is really weird, IMHO. The error is passed as a string, not valid JSON, then JSON is kind of awkwardly extracted from it, but the real JSON is actually inside that JSON, so it gets decoded again... And we end up with this:

result.error = JSON.parse(JSON.parse(result.error).content).error;

As I said I don't really care about web UI, and don't even use it, so I just left this all this logic intact (just wrote it differently), but maybe something needs to be done here. Or maybe it all works as intended, I don't know. Didn't really look too deep into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants