From 487ad01a686ce2b6282d83c708cfd72f923f7715 Mon Sep 17 00:00:00 2001 From: finetune <82650881+finetunej@users.noreply.github.com> Date: Tue, 13 Jun 2023 19:03:58 +0200 Subject: [PATCH] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index ded5d86..4bcba3e 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,10 @@ For comparison, the LLaMa tokenizer contains 23964 tokens made up only of latin The JavaScript implementation used by the NovelAI frontend can be found [here](https://github.com/NovelAI/nai-js-tokenizer). +## V2 + +For V2, the original digit special tokens were replaced with english contractions. Digits will therefore be encoded using corresponding the byte tokens instead. + ## License The tokenizer is licensed under the GNU General Public License, version 2.