Configurable kvcache & fix repeat chat history #41

guoqingbao · 2024-06-20T05:10:14Z

In this PR, several issues related to chat history have been fixed, making candle-vllm compatible with ChatGPT-like frontends (chat UIs).

The use of kvcache can now be configured with the parameter kvcache_mem, which indicates the size of GPU memory used for kvcache in MB (default is 4096MB). Candle-vllm can calculate the number of GPU blocks used for kvcache.
Chat history management can be configured with the parameter record_conversation. By default, candle-vllm does not record chat history; instead, the client sends both the messages and the contextual history to candle-vllm. This resolves the issue of repeated chat history recorded by both clients and candle-vllm. If record_conversation is set to True, the client sends only new chat messages to candle-vllm, and candle-vllm is responsible for recording the previous chat messages. However, this approach requires per-session chat recording, which is not yet implemented, so the default approach is recommended.

I discovered that while the kvcache is cleared after each request, the corresponding chat history in candle-vllm was not cleared, causing previous chats to affect new ones. This PR also addresses this issue.

I will create a demo video to showcase the chat conversation using popular ChatUI(s) with candle-vllm as the backend service.

EricLBuehler

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

src/main.rs

guoqingbao · 2024-06-20T06:22:44Z

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

I have created a demo chat for candle-vllm, which includes instructions for running a candle-vllm-backed chat conversation and a demo video. If you think it is suitable, I can include it in the ReadMe of candle-vllm.

https://github.com/guoqingbao/ChattierGPT-UI

EricLBuehler · 2024-06-20T06:25:16Z

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

I have created a demo chat for candle-vllm, which includes instructions for running a candle-vllm-backed chat conversation and a demo video. If you think it is suitable, I can include it in the ReadMe of candle-vllm.

https://github.com/guoqingbao/ChattierGPT-UI

That would be great, please feel free to do so!

guoqingbao · 2024-06-20T06:55:59Z

Thanks for this PR, it looks great! I just made a few comments regarding some magic values, and calculating the number of CPU blocks.

I have created a demo chat for candle-vllm, which includes instructions for running a candle-vllm-backed chat conversation and a demo video. If you think it is suitable, I can include it in the ReadMe of candle-vllm.
https://github.com/guoqingbao/ChattierGPT-UI

That would be great, please feel free to do so!

The instructions for ChatUI and the demo video have been added.

EricLBuehler

Great, looks good!

guoqingbao added 5 commits June 19, 2024 15:02

Optional logprobs & fix llama eos/stop token

b7c2e3d

Cargo fmt

a449cad

Mention other options for chat completion request

f7f1988

Merge branch 'EricLBuehler:master' into master

61cc400

Configurable kvcache & fix repeat chat history

ae7f54c

EricLBuehler reviewed Jun 20, 2024

View reviewed changes

src/main.rs Outdated Show resolved Hide resolved

src/main.rs Outdated Show resolved Hide resolved

src/main.rs Outdated Show resolved Hide resolved

guoqingbao added 3 commits June 20, 2024 14:38

Improve readability

78f184c

Merge branch 'master' of github.com:guoqingbao/candle-vllm

b476402

Instructions for ChatUI & add demo chat video

597aaec

EricLBuehler approved these changes Jun 20, 2024

View reviewed changes

EricLBuehler merged commit c531fc3 into EricLBuehler:master Jun 20, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable kvcache & fix repeat chat history #41

Configurable kvcache & fix repeat chat history #41

guoqingbao commented Jun 20, 2024

EricLBuehler left a comment

guoqingbao commented Jun 20, 2024

EricLBuehler commented Jun 20, 2024

guoqingbao commented Jun 20, 2024

EricLBuehler left a comment

Configurable kvcache & fix repeat chat history #41

Configurable kvcache & fix repeat chat history #41

Conversation

guoqingbao commented Jun 20, 2024

EricLBuehler left a comment

Choose a reason for hiding this comment

guoqingbao commented Jun 20, 2024

EricLBuehler commented Jun 20, 2024

guoqingbao commented Jun 20, 2024

EricLBuehler left a comment

Choose a reason for hiding this comment