Skip to content

Releases: belladoreai/llama-tokenizer-js

v1.2.2

27 Jun 17:59
Compare
Choose a tag to compare

Minor fixes to decode, which should not have any effect except for situations where the user inputs invalid parameter combinations:

  • When user decodes with add_bos_token set to True, before this change we would just cut the first token out, assuming it must be bos token, but now we check if it's the bos token, and don't cut the first token if it's something else
  • When user decodes with add_preceding_space set to True, before this change we would just assume that the first character after decoding must be space, and we would cut it out, but now we check if it's space, and don't cut it out if it's something else

v1.2.1

24 Mar 18:40
Compare
Choose a tag to compare

TypeScript fix

v1.2.0

24 Mar 18:17
Compare
Choose a tag to compare
  • Add TypeScript types definition file
  • Refactor tokenizer into a Class
  • Allow passing custom vocab and merge data to tokenizer
  • Allow passing custom tests to tokenizer test runner

v1.1.3

07 Aug 21:50
Compare
Choose a tag to compare
  • Fix bug in a function that was unused (so not affecting tokenizer results)
  • Support very large inputs (previous version was not guaranteed to produce correct results for inputs larger than 100 000 characters, although in practice it would almost always produce correct results for large inputs)

v1.1.2

01 Aug 07:22
Compare
Choose a tag to compare

Bugfix to support Next.js and other environments where performance.now() is not available.

v1.1.1

24 Jun 19:07
Compare
Choose a tag to compare

Bugfix affecting results in extremely rare cases: equal prio merges are now always performed left-to-right.

v1.1.0

16 Jun 13:44
Compare
Choose a tag to compare

Add support for different runtimes

v1.0.1

13 Jun 20:27
Compare
Choose a tag to compare
Release 1.0.1

v1.0.0

12 Jun 16:33
Compare
Choose a tag to compare
Release v1.0.0