Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Streamer] Fix UTF-8 handling in streamer (mlc-ai#2978)
This PR fixes a bug in the streamer handling for UTF-8 characters. Prior to this PR, the streamer has an assumption that a replacement character (`�`) always correspond to an entire token. However, for the Qwen2 model tokenizer, some token can be ` �` if decoded directly, which breaks the assumption and leads to incorrect result generated by the streamer. This PR fixes this issue with a safer behavior that does not rely on such an assumption.
- Loading branch information