From cb34d22f9439ba9979831f47445f5595da82275d Mon Sep 17 00:00:00 2001 From: Belladore <135602125+belladoreai@users.noreply.github.com> Date: Sat, 24 Jun 2023 22:04:24 +0300 Subject: [PATCH] Release v1.1.1 --- README.md | 5 +++-- package.json | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 661d028..1bdb93c 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,8 @@ Intended use case is calculating token count accurately on the client-side. Click here for demo -Features: +## Features + - Easy to use: 0 dependencies, code and data baked into a single file. - Compatible with most LLaMA-based models (see [Compatibility](#compatibility)) - Optimized running time: tokenize a sentence in roughly 1ms, or 2000 tokens in roughly 20ms. @@ -52,7 +53,7 @@ llamaTokenizer.decode([1, 15043, 3186, 29991]) > 'Hello world!' ``` -Special use case: decode only selected individual tokens, without including beginning of prompt token and preceeding space: +Note that special "beginning of sentence" token and preceding space are added by default when encoded (and correspondingly expected when decoding). These affect token count. There may be some use cases where you don't want to add these. You can pass additional boolean parameters in these use cases. For example, if you want to decode an individual token: ``` llamaTokenizer.decode([3186], false, false) diff --git a/package.json b/package.json index e01a037..4d31ea6 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "llama-tokenizer-js", - "version": "1.1.0", + "version": "1.1.1", "description": "JS tokenizer for LLaMA-based LLMs", "main": "llama-tokenizer.js", "scripts": {