Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Better chat UX for llama-cli #11202

Open
ngxson opened this issue Jan 12, 2025 · 8 comments
Open

Feature Request: Better chat UX for llama-cli #11202

ngxson opened this issue Jan 12, 2025 · 8 comments
Labels
enhancement New feature or request

Comments

@ngxson
Copy link
Collaborator

ngxson commented Jan 12, 2025

Feature Description

Motivation

Inspired by my first attempt at adding commands to llama-cli, it seems that main.cpp is a bit too complex for a chat-based application.

Now, you might ask: don’t we already have llama-run and simple-chat for that? True, but the issue is that these examples are largely standalone - they don’t rely on common.cpp.

Given that most users are already familiar with llama-cli, enhancing the chat UX within it could benefit a majority of users, including those who don’t actively follow llama.cpp’s development. This approach is similar to what we did with the updated UI for llama-server: a seamless, drop-in replacement that eliminates the need for users to change their workflows.

Possible Implementation

One way to do is to split all the chat-related logic into main/chat.hpp, then call it from main.cpp, see #11203

CC @ggerganov and @slaren for discussion, thanks!

@ngxson ngxson added the enhancement New feature or request label Jan 12, 2025
@ggerganov
Copy link
Owner

I think it's OK to repurpose llama-cli mainly for chat and remove all the extra functionality such as context extension, custom prefix/suffixes, antiprompts, sessions, etc. These can be separated in a new example oriented for developers to experiment with this kind of special functionality.

#11203 looks OK to me. IMO we can make a quick pass to strip main.cpp from the special features that I mentioned above - let me know if you want me to help with this. Later, we'll re-implement those depending on the feedback about their disappearance.

@slaren
Copy link
Collaborator

slaren commented Jan 12, 2025

simple-chat is meant as a demonstration of the API, it was never intended to be a complete chat program. llama-run however is. I am not sure that it is a good idea to have two examples competing to do the same job.

I disagree about breaking llama-cli and just re-adding features based on feedback. IMO we need to take backwards compatibility a lot more seriously than we do, at some point if we keep breaking other people's programs and scripts, they will just give up and move elsewhere. We need to have the mindset that each new API or flag added will need to be supported for a long time.

@MaggotHATE
Copy link
Contributor

Prefix and suffix are definitely needed in main or its substitute: for now most examples use predefined chat templates only. It's now always convenient - there are cases when a custom template is needed, or the existing template needs to be extended (for example, suffix might contain the beginning of the input as an additional guidance). I see that it is simply combined with common_chat_apply_template, so it shouldn't be a problem.

As for antiprompts, aren't they necessary for base (not instruct) models and custom instructions? Would eos be enough?

@ggerganov
Copy link
Owner

Yes, I guess the functions such as /regen, /readfile, etc. would make more sense to be added to the llama-run example to avoid having two chat-focused examples. If we do it like this, then llama-cli probably no longer needs to support chat mode and can be simplified?

@ngxson
Copy link
Collaborator Author

ngxson commented Jan 12, 2025

simple-chat is meant as a demonstration of the API, it was never intended to be a complete chat program. llama-run however is. I am not sure that it is a good idea to have two examples competing to do the same job.

The main problem with llama-run is that it doesn't use common.cpp, so it seems more like a full prod-ready program to me (i.e. it should be a dedicated repo instead of being an example inside llama.cpp)

My plan is trying not to breaking many things while doing #11203. @MaggotHATE Suffix, prefix are easy to add back, I totally understand that many users still use them for alpaca-style chatbots. Maybe other things that need to worry about too, but I'll make a list once we're more clear about the global direction.

If we do it like this, then llama-cli probably no longer needs to support chat mode and can be simplified?

Yup, that's one way. But the problem is that most guides that I can find on internet already told users to use llama-cli, so we will need to communicate to users to migrate to llama-run. To me, removing chat support from llama-cli is a bit like giving up a domain name that already had a good score on SEO (I'm talking about you, twitter.com)

A hypothetical better plan could be:

  • Use more common_ in llama-run
  • Compile both run.cpp and main.cpp into one package, distribute it as llama-cli

@MaggotHATE
Copy link
Contributor

To be honest, I'm not sure it's a good idea to add common into examples that were designed without it in mind. It would mean that common is the officially suggested way of creating chat-style applications that are based on llama.cpp - is it, though?

I would rather suggest starting from slight refurbishment of main - maybe separating everything it has into functions/structs, and then, when it's easier to digest (but still a bit too complex), simplify it. This way would be safer feature-wise and will not break user experience.

@ngxson
Copy link
Collaborator Author

ngxson commented Jan 12, 2025

To be honest, I'm not sure it's a good idea to add common into examples that were designed without it in mind. It would mean that common is the officially suggested way of creating chat-style applications that are based on llama.cpp - is it, though?

What I mean by using common is that because these examples are made to stay within llama.cpp, reusing common among the llama.cpp examples can be beneficial because:

  • In the long run, it make maintaining easier, especially when there are breaking changes
  • It avoids long build time when the example's code grows
  • It makes the code easier to look at

And no, I'm not encouraging developers to use it in downstream projects. My point is that common is useful inside llama.cpp.

I would rather suggest starting from slight refurbishment of main - maybe separating everything it has into functions/structs, and then, when it's easier to digest (but still a bit too complex), simplify it. This way would be safer feature-wise and will not break user experience.

My POV is that it's much harder than saying. The main.cpp works that way because of historical reasons (i.e. it track the inference based on tokens, not based on message pairs), so moving things can make the code a bit easier to read, but not easier to add more chat-related on top of it.

(For example, the /regen command is easier to implement if we're tracking by user-assistant message pairs, instead of tracking number of tokens)

@MaggotHATE
Copy link
Contributor

based on tokens, not based on message pairs

Is it bad though? I'm sure it's a valid way to organize inference in a chat app, but it is definitely less convenient to work with. On the other hand, it's more transparent in its own way, so maybe there's a value in keeping main this way as a learning example. I understand the problem of maintaining it, though. Adding features would require the same tokens-related approach to it.

I need to look at the most recent version again and see what can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants