Support Yi & StableLM models, change default maximum length of generated tokens for smooth chat. #57

guoqingbao · 2024-07-15T08:42:15Z

Key changes in this PR:

Support Yi & StableLM models
The maximum length of generated tokens changed to 1/5 of max_seq_len by default, this will enalbe smooth chat with minimal breaks. Users are allowed to set max_gen_tokens for the model throught parameters.

Tested cases:

cargo run --release -- --port 2000 --weight-path /home/stablelm-zephyr-3b/ stable-lm --repeat-last-n 32

cargo run --release -- --port 2000 --weight-path /home/yi-6b/ yi --repeat-last-n 32

…ted tokens for smooth chat.

guoqingbao added 2 commits July 15, 2024 16:34

Support Yi & StableLM models, change default maximum length of genera…

764a3ef

…ted tokens for smooth chat.

Remove unused

dca8761

guoqingbao merged commit 0be4121 into master Jul 15, 2024
5 checks passed

Mention Yi & StableLM model, max-gen-tokens parameter in ReadMe.

5e60a95

guoqingbao mentioned this pull request Jul 15, 2024

Support chat serving for more models #44

Open

Provide feedback