Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
finetunej authored Jun 13, 2023
1 parent b208a11 commit 487ad01
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ For comparison, the LLaMa tokenizer contains 23964 tokens made up only of latin

The JavaScript implementation used by the NovelAI frontend can be found [here](https://github.com/NovelAI/nai-js-tokenizer).

## V2

For V2, the original digit special tokens were replaced with english contractions. Digits will therefore be encoded using corresponding the byte tokens instead.

## License

The tokenizer is licensed under the GNU General Public License, version 2.

0 comments on commit 487ad01

Please sign in to comment.