Skip to content

Commit

Permalink
Merge pull request #68 from EricLBuehler/develop
Browse files Browse the repository at this point in the history
Update demo video
  • Loading branch information
guoqingbao authored Jul 26, 2024
2 parents e922750 + c42266e commit eb41272
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 3 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,10 @@ Currently, candle-vllm supports chat serving for the following models.
| #12 | Moondream-2 (Multimodal LLM) |TBD|TBD|


## Demo Chat with candle-vllm (71 tokens/s, LLaMa2 7B, bf16, on A100)
<img src="./res/candle-vllm-demo.gif" width="90%" height="90%" >
## Demo Chat with candle-vllm (61-65 tokens/s, LLaMa3.1 8B, bf16, on A100)

https://github.com/user-attachments/assets/290d72d8-d5e6-41a3-8bd8-1d9d732aee3b


## Usage
See [this folder](examples/) for some examples.
Expand Down
Binary file removed res/candle-vllm-demo.gif
Binary file not shown.
2 changes: 1 addition & 1 deletion src/openai/pipelines/pipeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ impl ModelLoader for DefaultLoader {
stop_token_ids.push(eos_token);
}

//custome stop tokens
//custom stop tokens
if let Some(custom_stop) = &config.custom_stop_tokens {
for stop in custom_stop {
match tokenizer.get_token(&stop) {
Expand Down

0 comments on commit eb41272

Please sign in to comment.