Skip to content

v6.0.0

Compare
Choose a tag to compare
@davidmezzetti davidmezzetti released this 10 Aug 09:14
· 386 commits to master since this release

🥳 We're excited to announce the release of txtai 6.0 🥳

This significant milestone release marks txtai's 3 year birthday🎉 If you like txtai, please remember to give it a ⭐!

6.0 adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow. See below for more.

Breaking changes

The vast majority of changes are fully backwards compatible. New features are only enabled when specified. The only breaking change is with the Scoring terms interface, where the index format changed. The main Scoring interface used for word vectors weighting is unchanged.

New Features

  • Better BM25 (#508)
  • Hybrid Search (#509)
  • Add additional indexes for embeddings (#515)
  • Refactor Sequences and Generator pipeline into single LLM pipeline (#494)
  • Support passing model parameters in pipelines (#500)
  • Add "auto-id" capability to Embeddings (#502)
  • Add UUID auto-id (#505)
  • Add keyword arguments to Embeddings constructor (#503)
  • Add top level imports (#514)

Improvements

  • Add NumPy ANN Backend (#468)
  • Add PyTorch ANN Backend (#469)
  • Add notebook covering embeddings configuration options (#470)
  • make data - No such file or directory (#473)
  • Improve derivation of default embeddings model path (#476)
  • Add accelerate dependency (#477)
  • Add baseball example application (#484)
  • Update minimum Python version to 3.8 (#485)
  • Add WAL option for SQLite (#488)
  • Add support for alternative acceleration devices (#489)
  • Add support for passing torch devices to embeddings and pipelines (#492)
  • Documentation updates (#495)
  • Improve Pooling tokenizer load method (#499)
  • Add ability for extractor to reference another pipeline in applications (#501)
  • Reorganize embeddings configuration documentation (#504)
  • Support Unicode Text Segmentation in Tokenizer (#507)
  • ANN improvements (#510)
  • Add multilingual graph topic modeling (#511)
  • Add support for configurable text/object fields (#512)
  • Update documentation for 6.0 (#513)
  • Add count method to database (#517)
  • Improvements when indexing through Applications (#518)
  • Add what's new in txtai 6.0 notebook (#519)

Bug Fixes

  • OpenMP issues with torch 1.13+ on macOS (#377)
  • Unique constrant violation issue with DuckDB (#475)
  • Incorrect results can be returned by embedding search when Content storage enabled (#496)
  • Fix issues with graph.infertopics (#516)