Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable kvcache & fix repeat chat history #41

Merged
merged 8 commits into from
Jun 20, 2024

Conversation

guoqingbao
Copy link
Collaborator

In this PR, several issues related to chat history have been fixed, making candle-vllm compatible with ChatGPT-like frontends (chat UIs).

  1. The use of kvcache can now be configured with the parameter kvcache_mem, which indicates the size of GPU memory used for kvcache in MB (default is 4096MB). Candle-vllm can calculate the number of GPU blocks used for kvcache.

  2. Chat history management can be configured with the parameter record_conversation. By default, candle-vllm does not record chat history; instead, the client sends both the messages and the contextual history to candle-vllm. This resolves the issue of repeated chat history recorded by both clients and candle-vllm. If record_conversation is set to True, the client sends only new chat messages to candle-vllm, and candle-vllm is responsible for recording the previous chat messages. However, this approach requires per-session chat recording, which is not yet implemented, so the default approach is recommended.

I discovered that while the kvcache is cleared after each request, the corresponding chat history in candle-vllm was not cleared, causing previous chats to affect new ones. This PR also addresses this issue.

I will create a demo video to showcase the chat conversation using popular ChatUI(s) with candle-vllm as the backend service.

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
@guoqingbao
Copy link
Collaborator Author

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

I have created a demo chat for candle-vllm, which includes instructions for running a candle-vllm-backed chat conversation and a demo video. If you think it is suitable, I can include it in the ReadMe of candle-vllm.

https://github.com/guoqingbao/ChattierGPT-UI

@EricLBuehler
Copy link
Owner

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

I have created a demo chat for candle-vllm, which includes instructions for running a candle-vllm-backed chat conversation and a demo video. If you think it is suitable, I can include it in the ReadMe of candle-vllm.

https://github.com/guoqingbao/ChattierGPT-UI

That would be great, please feel free to do so!

@guoqingbao
Copy link
Collaborator Author

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

I have created a demo chat for candle-vllm, which includes instructions for running a candle-vllm-backed chat conversation and a demo video. If you think it is suitable, I can include it in the ReadMe of candle-vllm.
https://github.com/guoqingbao/ChattierGPT-UI

That would be great, please feel free to do so!

The instructions for ChatUI and the demo video have been added.

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks good!

@EricLBuehler EricLBuehler merged commit c531fc3 into EricLBuehler:master Jun 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants