Skip to content

Commit

Permalink
Fix split comment
Browse files Browse the repository at this point in the history
  • Loading branch information
abuelnasr0 committed Feb 21, 2024
1 parent 464555c commit ab7b48a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion keras_nlp/tokenizers/byte_pair_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def split_strings_for_bpe(inputs, unsplittable_tokens_pattern=None):
# applying this split directly, because otherwise we will not split
# unsplittable tokens from inputs properly, because of this pattern
# ` ?[^\s\p{L}\p{N}{special_spaces}]+`.
# e.g., [" <s>"] will be [" <s>"] instead of [" ", "<s>"]
# e.g., [" </s>"] will be [" </", "s", ">"] instead of [" ", "</s>"]
raw_tokens = tf_text.regex_split(
raw_tokens,
split_pattern_1_with_unsplittable_tokens,
Expand Down

0 comments on commit ab7b48a

Please sign in to comment.