Skip to content

Commit

Permalink
Add link to JS tokenizer code
Browse files Browse the repository at this point in the history
  • Loading branch information
finetunej authored Jun 7, 2023
1 parent 8cf150f commit 6dcddf4
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ Finally, I would like to give some stats about token distribution. Our tokenizer

For comparison, the LLaMa tokenizer contains 23964 tokens made up only of latin alphabet characters, no Japanese token longer than a single character, 836 Japanese characters and 7224 other tokens.

## JavaScript implementation

The JavaScript implementation used by the NovelAI frontend can be found [ħere](https://github.com/NovelAI/nai-js-tokenizer).

## License

The tokenizer is licensed under the GNU General Public License, version 2.

0 comments on commit 6dcddf4

Please sign in to comment.