diff --git a/README.md b/README.md index 661d028..1bdb93c 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,8 @@ Intended use case is calculating token count accurately on the client-side. Click here for demo -Features: +## Features + - Easy to use: 0 dependencies, code and data baked into a single file. - Compatible with most LLaMA-based models (see [Compatibility](#compatibility)) - Optimized running time: tokenize a sentence in roughly 1ms, or 2000 tokens in roughly 20ms. @@ -52,7 +53,7 @@ llamaTokenizer.decode([1, 15043, 3186, 29991]) > 'Hello world!' ``` -Special use case: decode only selected individual tokens, without including beginning of prompt token and preceeding space: +Note that special "beginning of sentence" token and preceding space are added by default when encoded (and correspondingly expected when decoding). These affect token count. There may be some use cases where you don't want to add these. You can pass additional boolean parameters in these use cases. For example, if you want to decode an individual token: ``` llamaTokenizer.decode([3186], false, false) diff --git a/package.json b/package.json index e01a037..4d31ea6 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "llama-tokenizer-js", - "version": "1.1.0", + "version": "1.1.1", "description": "JS tokenizer for LLaMA-based LLMs", "main": "llama-tokenizer.js", "scripts": {