llama : (proposal) return enum for `llama_decode` and `llama_encode` #9434

ngxson · 2024-09-11T13:38:48Z

Return value of llama_encode and llama_decode is currently not very well-documented.

This PR propose using an enum for it. I'm not sure if this is the good way to do, so feel free to discuss more on that.

I have read the contributing guidelines
Self-reported review complexity:
- Low

ggerganov · 2024-09-11T14:28:52Z

Another option is to have a single enum llama_status that encodes all return codes for the entire API. Not sure which is the better practice.

ngxson · 2024-09-11T14:50:12Z

I have just have a look on all public functions of the library to see which one can be benefit from having enum as returned status:

both llama_decode and llama_encode current return positive and negative numbers for status code
llama_lora_adapter_set and llama_lora_adapter_remove returns -1 on error
llama_control_vector_apply returns -1 on error

So for now I think having one single enum llama_result may not be a good idea. Probably we can have 2 sets of result code:

One for llama_decode and llama_encode, since they are mostly the same
One for all other APIs

What do you think about this?

slaren · 2024-09-11T14:57:56Z

All functions that can fail should return an status. This would also include llama_load_model_from_file, llama_new_context_with_model, llama_model_quantize, llama_get_logits_ith, llama_token_get_text, and a lot more.

The status codes in this PR are too specific to be reused, but they don't need to be. A generic "failed memory allocation" and a "invalid parameter" status would cover most cases. IMO it's not good to be too specific in the error codes, applications do not need to know that, and it exposes implementation details that may change in the future.

ngxson · 2024-09-11T15:31:24Z

@slaren Yup, I agree that the error code introduced in this PR is a bit too specific (I'm just doing a 1-to-1 map from the old code though)

For the proposal about other functions like llama_load_model_from_file or llama_new_context_with_model, the problem is that it's currently returning a struct, so changing the function signature maybe a breaking change. Although it's possible to resolve this by doing function overloading, I'm not sure if it's applicable in llama.h C-style API (?)

slaren · 2024-09-11T15:40:24Z

It would be a breaking change for sure, most functions would need to be changed, but it should be done eventually. My preference would be to change all functions that can fail to return a status code. ggml also needs a similar refactor.

Xarbirus · 2024-09-11T19:22:56Z

I had the same idea and made a similar solution in my fork. But I solved this problem through a single llama_status. But I just couldn’t get around to making this solution good looking, so I didn’t publish it. In any case, it’s very good that I’m not the only one thinking about this.

But in these statuses it is also important not to forget that ggml_backend_sched_graph_compute and ggml_backend_sched_graph_compute_async return ggml_status, and this status must be returned from llama_graph_compute and processed in llama_encode and llama_decode.

Therefore ggml_status needs to be somehow combined with llama_status(es).

conradev · 2024-09-11T19:23:34Z

I would love to see this change. I introduced the ability to abort the Metal backend in ggml (85fca8d), and I want to bring support for this up into llama.cpp and llama_decode. Having explicit error codes will be greatly helpful!

ngxson · 2024-09-11T19:51:07Z

@slaren I agree that changing all function to return status code is a good idea. This will be a big breaking change though, so I think we should do at later stage (given that currently there're some other on-going reworks in llama.cpp)

@Xarbirus @conradev As discussed earlier, status returned from llama_decode/encode don't need to be too specific. The ability to abort the operation is nice though, so I think llama_status_aborted could be one thing to add.

For now I will close this PR since the proposed approach is not what we want, but feel free to discuss if you have other ideas.

llama : return enum for llama_decode and llama_encode

3dbd2ee

ngxson requested a review from ggerganov September 11, 2024 13:38

ngxson closed this Sep 11, 2024

Xarbirus mentioned this pull request Sep 17, 2024

llama: (proposal) propagating the results of graph_compute to the user interface #9525

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : (proposal) return enum for `llama_decode` and `llama_encode` #9434

llama : (proposal) return enum for `llama_decode` and `llama_encode` #9434

ngxson commented Sep 11, 2024 •

edited

Loading

ggerganov commented Sep 11, 2024

ngxson commented Sep 11, 2024 •

edited

Loading

slaren commented Sep 11, 2024

ngxson commented Sep 11, 2024

slaren commented Sep 11, 2024

Xarbirus commented Sep 11, 2024

conradev commented Sep 11, 2024

ngxson commented Sep 11, 2024

llama : (proposal) return enum for llama_decode and llama_encode #9434

llama : (proposal) return enum for llama_decode and llama_encode #9434

Conversation

ngxson commented Sep 11, 2024 • edited Loading

ggerganov commented Sep 11, 2024

ngxson commented Sep 11, 2024 • edited Loading

slaren commented Sep 11, 2024

ngxson commented Sep 11, 2024

slaren commented Sep 11, 2024

Xarbirus commented Sep 11, 2024

conradev commented Sep 11, 2024

ngxson commented Sep 11, 2024

llama : (proposal) return enum for `llama_decode` and `llama_encode` #9434

llama : (proposal) return enum for `llama_decode` and `llama_encode` #9434

ngxson commented Sep 11, 2024 •

edited

Loading

ngxson commented Sep 11, 2024 •

edited

Loading