Releases: belladoreai/llama-tokenizer-js
Releases · belladoreai/llama-tokenizer-js
v1.2.2
Minor fixes to decode, which should not have any effect except for situations where the user inputs invalid parameter combinations:
- When user decodes with add_bos_token set to True, before this change we would just cut the first token out, assuming it must be bos token, but now we check if it's the bos token, and don't cut the first token if it's something else
- When user decodes with add_preceding_space set to True, before this change we would just assume that the first character after decoding must be space, and we would cut it out, but now we check if it's space, and don't cut it out if it's something else
v1.2.1
TypeScript fix
v1.2.0
- Add TypeScript types definition file
- Refactor tokenizer into a Class
- Allow passing custom vocab and merge data to tokenizer
- Allow passing custom tests to tokenizer test runner
v1.1.3
- Fix bug in a function that was unused (so not affecting tokenizer results)
- Support very large inputs (previous version was not guaranteed to produce correct results for inputs larger than 100 000 characters, although in practice it would almost always produce correct results for large inputs)
v1.1.2
Bugfix to support Next.js and other environments where performance.now()
is not available.
v1.1.1
Bugfix affecting results in extremely rare cases: equal prio merges are now always performed left-to-right.
v1.1.0
Add support for different runtimes
v1.0.1
Release 1.0.1
v1.0.0
Release v1.0.0