Feature Request: Better chat UX for llama-cli #11202

ngxson · 2025-01-12T13:16:33Z

Feature Description

Ability to run llama-cli in chat (conversation) mode automatically if there is a built-in chat template
Having commands like /regen, /readfile, etc (demo in main : add new feature: special commands #10145)

Motivation

Inspired by my first attempt at adding commands to llama-cli, it seems that main.cpp is a bit too complex for a chat-based application.

Now, you might ask: don’t we already have llama-run and simple-chat for that? True, but the issue is that these examples are largely standalone - they don’t rely on common.cpp.

Given that most users are already familiar with llama-cli, enhancing the chat UX within it could benefit a majority of users, including those who don’t actively follow llama.cpp’s development. This approach is similar to what we did with the updated UI for llama-server: a seamless, drop-in replacement that eliminates the need for users to change their workflows.

Possible Implementation

One way to do is to split all the chat-related logic into main/chat.hpp, then call it from main.cpp, see #11203

CC @ggerganov and @slaren for discussion, thanks!

The text was updated successfully, but these errors were encountered:

ggerganov · 2025-01-12T16:38:02Z

I think it's OK to repurpose llama-cli mainly for chat and remove all the extra functionality such as context extension, custom prefix/suffixes, antiprompts, sessions, etc. These can be separated in a new example oriented for developers to experiment with this kind of special functionality.

#11203 looks OK to me. IMO we can make a quick pass to strip main.cpp from the special features that I mentioned above - let me know if you want me to help with this. Later, we'll re-implement those depending on the feedback about their disappearance.

slaren · 2025-01-12T17:09:55Z

simple-chat is meant as a demonstration of the API, it was never intended to be a complete chat program. llama-run however is. I am not sure that it is a good idea to have two examples competing to do the same job.

I disagree about breaking llama-cli and just re-adding features based on feedback. IMO we need to take backwards compatibility a lot more seriously than we do, at some point if we keep breaking other people's programs and scripts, they will just give up and move elsewhere. We need to have the mindset that each new API or flag added will need to be supported for a long time.

MaggotHATE · 2025-01-12T17:11:10Z

Prefix and suffix are definitely needed in main or its substitute: for now most examples use predefined chat templates only. It's now always convenient - there are cases when a custom template is needed, or the existing template needs to be extended (for example, suffix might contain the beginning of the input as an additional guidance). I see that it is simply combined with common_chat_apply_template, so it shouldn't be a problem.

As for antiprompts, aren't they necessary for base (not instruct) models and custom instructions? Would eos be enough?

ggerganov · 2025-01-12T18:03:21Z

Yes, I guess the functions such as /regen, /readfile, etc. would make more sense to be added to the llama-run example to avoid having two chat-focused examples. If we do it like this, then llama-cli probably no longer needs to support chat mode and can be simplified?

ngxson · 2025-01-12T18:27:52Z

simple-chat is meant as a demonstration of the API, it was never intended to be a complete chat program. llama-run however is. I am not sure that it is a good idea to have two examples competing to do the same job.

The main problem with llama-run is that it doesn't use common.cpp, so it seems more like a full prod-ready program to me (i.e. it should be a dedicated repo instead of being an example inside llama.cpp)

My plan is trying not to breaking many things while doing #11203. @MaggotHATE Suffix, prefix are easy to add back, I totally understand that many users still use them for alpaca-style chatbots. Maybe other things that need to worry about too, but I'll make a list once we're more clear about the global direction.

If we do it like this, then llama-cli probably no longer needs to support chat mode and can be simplified?

Yup, that's one way. But the problem is that most guides that I can find on internet already told users to use llama-cli, so we will need to communicate to users to migrate to llama-run. To me, removing chat support from llama-cli is a bit like giving up a domain name that already had a good score on SEO (I'm talking about you, twitter.com)

A hypothetical better plan could be:

Use more common_ in llama-run
Compile both run.cpp and main.cpp into one package, distribute it as llama-cli

MaggotHATE · 2025-01-12T18:40:39Z

To be honest, I'm not sure it's a good idea to add common into examples that were designed without it in mind. It would mean that common is the officially suggested way of creating chat-style applications that are based on llama.cpp - is it, though?

I would rather suggest starting from slight refurbishment of main - maybe separating everything it has into functions/structs, and then, when it's easier to digest (but still a bit too complex), simplify it. This way would be safer feature-wise and will not break user experience.

ngxson · 2025-01-12T18:48:38Z

To be honest, I'm not sure it's a good idea to add common into examples that were designed without it in mind. It would mean that common is the officially suggested way of creating chat-style applications that are based on llama.cpp - is it, though?

What I mean by using common is that because these examples are made to stay within llama.cpp, reusing common among the llama.cpp examples can be beneficial because:

In the long run, it make maintaining easier, especially when there are breaking changes
It avoids long build time when the example's code grows
It makes the code easier to look at

And no, I'm not encouraging developers to use it in downstream projects. My point is that common is useful inside llama.cpp.

I would rather suggest starting from slight refurbishment of main - maybe separating everything it has into functions/structs, and then, when it's easier to digest (but still a bit too complex), simplify it. This way would be safer feature-wise and will not break user experience.

My POV is that it's much harder than saying. The main.cpp works that way because of historical reasons (i.e. it track the inference based on tokens, not based on message pairs), so moving things can make the code a bit easier to read, but not easier to add more chat-related on top of it.

(For example, the /regen command is easier to implement if we're tracking by user-assistant message pairs, instead of tracking number of tokens)

MaggotHATE · 2025-01-12T19:03:45Z

based on tokens, not based on message pairs

Is it bad though? I'm sure it's a valid way to organize inference in a chat app, but it is definitely less convenient to work with. On the other hand, it's more transparent in its own way, so maybe there's a value in keeping main this way as a learning example. I understand the problem of maintaining it, though. Adding features would require the same tokens-related approach to it.

I need to look at the most recent version again and see what can be done.

ngxson added the enhancement New feature or request label Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Better chat UX for llama-cli #11202

Feature Request: Better chat UX for llama-cli #11202

ngxson commented Jan 12, 2025 •

edited

Loading

ggerganov commented Jan 12, 2025

slaren commented Jan 12, 2025

MaggotHATE commented Jan 12, 2025

ggerganov commented Jan 12, 2025

ngxson commented Jan 12, 2025 •

edited

Loading

MaggotHATE commented Jan 12, 2025

ngxson commented Jan 12, 2025 •

edited

Loading

MaggotHATE commented Jan 12, 2025

Feature Request: Better chat UX for llama-cli #11202

Feature Request: Better chat UX for llama-cli #11202

Comments

ngxson commented Jan 12, 2025 • edited Loading

Feature Description

Motivation

Possible Implementation

ggerganov commented Jan 12, 2025

slaren commented Jan 12, 2025

MaggotHATE commented Jan 12, 2025

ggerganov commented Jan 12, 2025

ngxson commented Jan 12, 2025 • edited Loading

MaggotHATE commented Jan 12, 2025

ngxson commented Jan 12, 2025 • edited Loading

MaggotHATE commented Jan 12, 2025

ngxson commented Jan 12, 2025 •

edited

Loading

ngxson commented Jan 12, 2025 •

edited

Loading

ngxson commented Jan 12, 2025 •

edited

Loading