Skip to content

Commit

Permalink
fix: crash on token not found at spm
Browse files Browse the repository at this point in the history
Signed-off-by: thxCode <[email protected]>
  • Loading branch information
thxCode committed Aug 6, 2024
1 parent 0b90345 commit 6ed2f79
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion src/llama-vocab.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,13 @@ struct llm_tokenizer_spm {
// output any symbols that did not form tokens as bytes.
output.reserve(output.size() + symbol.n);
for (int j = 0; j < (int)symbol.n; ++j) {
llama_vocab::id token_id = llama_byte_to_token_impl(vocab, symbol.text[j]);
llama_vocab::id token_id;
try {
token_id = llama_byte_to_token_impl(vocab, symbol.text[j]);
} catch(const std::exception & e) {
// not found, use UNK token instead.
token_id = vocab.special_unk_id;
}
output.push_back(token_id);
}
return;
Expand Down

0 comments on commit 6ed2f79

Please sign in to comment.