Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : (proposal) return enum for llama_decode and llama_encode #9434

Closed
wants to merge 1 commit into from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Sep 11, 2024

Return value of llama_encode and llama_decode is currently not very well-documented.

This PR propose using an enum for it. I'm not sure if this is the good way to do, so feel free to discuss more on that.


@ngxson ngxson requested a review from ggerganov September 11, 2024 13:38
@ggerganov
Copy link
Owner

Another option is to have a single enum llama_status that encodes all return codes for the entire API. Not sure which is the better practice.

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 11, 2024

I have just have a look on all public functions of the library to see which one can be benefit from having enum as returned status:

  • both llama_decode and llama_encode current return positive and negative numbers for status code
  • llama_lora_adapter_set and llama_lora_adapter_remove returns -1 on error
  • llama_control_vector_apply returns -1 on error

So for now I think having one single enum llama_result may not be a good idea. Probably we can have 2 sets of result code:

  • One for llama_decode and llama_encode, since they are mostly the same
  • One for all other APIs

What do you think about this?

@slaren
Copy link
Collaborator

slaren commented Sep 11, 2024

All functions that can fail should return an status. This would also include llama_load_model_from_file, llama_new_context_with_model, llama_model_quantize, llama_get_logits_ith, llama_token_get_text, and a lot more.

The status codes in this PR are too specific to be reused, but they don't need to be. A generic "failed memory allocation" and a "invalid parameter" status would cover most cases. IMO it's not good to be too specific in the error codes, applications do not need to know that, and it exposes implementation details that may change in the future.

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 11, 2024

@slaren Yup, I agree that the error code introduced in this PR is a bit too specific (I'm just doing a 1-to-1 map from the old code though)

For the proposal about other functions like llama_load_model_from_file or llama_new_context_with_model, the problem is that it's currently returning a struct, so changing the function signature maybe a breaking change. Although it's possible to resolve this by doing function overloading, I'm not sure if it's applicable in llama.h C-style API (?)

@slaren
Copy link
Collaborator

slaren commented Sep 11, 2024

It would be a breaking change for sure, most functions would need to be changed, but it should be done eventually. My preference would be to change all functions that can fail to return a status code. ggml also needs a similar refactor.

@Xarbirus
Copy link
Contributor

I had the same idea and made a similar solution in my fork. But I solved this problem through a single llama_status. But I just couldn’t get around to making this solution good looking, so I didn’t publish it. In any case, it’s very good that I’m not the only one thinking about this.

But in these statuses it is also important not to forget that ggml_backend_sched_graph_compute and ggml_backend_sched_graph_compute_async return ggml_status, and this status must be returned from llama_graph_compute and processed in llama_encode and llama_decode.

Therefore ggml_status needs to be somehow combined with llama_status(es).

@conradev
Copy link
Contributor

I would love to see this change. I introduced the ability to abort the Metal backend in ggml (85fca8d), and I want to bring support for this up into llama.cpp and llama_decode. Having explicit error codes will be greatly helpful!

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 11, 2024

@slaren I agree that changing all function to return status code is a good idea. This will be a big breaking change though, so I think we should do at later stage (given that currently there're some other on-going reworks in llama.cpp)

@Xarbirus @conradev As discussed earlier, status returned from llama_decode/encode don't need to be too specific. The ability to abort the operation is nice though, so I think llama_status_aborted could be one thing to add.

For now I will close this PR since the proposed approach is not what we want, but feel free to discuss if you have other ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants