From 6dcddf48d53932e7635466a4e52c364abbba7bc9 Mon Sep 17 00:00:00 2001 From: finetune <82650881+finetunej@users.noreply.github.com> Date: Wed, 7 Jun 2023 20:55:30 +0200 Subject: [PATCH] Add link to JS tokenizer code --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 0191e8d..d3e7ce6 100644 --- a/README.md +++ b/README.md @@ -24,6 +24,10 @@ Finally, I would like to give some stats about token distribution. Our tokenizer For comparison, the LLaMa tokenizer contains 23964 tokens made up only of latin alphabet characters, no Japanese token longer than a single character, 836 Japanese characters and 7224 other tokens. +## JavaScript implementation + +The JavaScript implementation used by the NovelAI frontend can be found [ħere](https://github.com/NovelAI/nai-js-tokenizer). + ## License The tokenizer is licensed under the GNU General Public License, version 2.