Skip to content

v3.0.0

Compare
Choose a tag to compare
@davidmezzetti davidmezzetti released this 04 May 19:17
· 1233 commits to master since this release

txtai 3.0.0 is a major release with a significant number of new features. This release overhauls the project structure, consolidates logic into pipelines and introduces workflows.

Summary of txtai features:

  • 🔎 Large-scale similarity search with multiple index backends (Faiss, Annoy, Hnswlib)
  • 📄 Create embeddings for text snippets, documents, audio and images. Supports transformers and word vectors.
  • 💡 Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction
  • ↪️️ Workflows that join pipelines together to aggregate business logic. txtai processes can be microservices or full-fledged indexing workflows.
  • 🔗 API bindings for JavaScript, Java, Rust and Go
  • ☁️ Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes)

New Features

  • Add Docker file for API (#59)
  • Require Faiss 1.7.0 (#60)
  • Add summary pipeline (#65)
  • Add text extraction pipeline (#66)
  • Add transcription pipeline (#67)
  • Add translation pipeline (#68)
  • Add workflow framework (#69)
  • Add additional pipeline abstraction layer for tensor frameworks (#70)
  • Add tests for new v3 functionality (#71)
  • Add notebooks covering new v3 functionality (#73)
  • Add Pipeline Factory (#76)
  • Add API extensions (#77)
  • Add workflow builder application (#80)
  • Add text segmentation pipeline (#81)
  • Add workflow to API (#82)
  • Add service workflow task (#83)
  • Add object storage workflow task (#84)
  • Add URL workflow task (#85)

Improvements

  • Refactor code into smaller components and modules (#63)
  • Modify pipeline to accept GPU device id (#64)
  • Allow direct download of sentence-transformer models (#72)
  • Update documentation, add site through GitHub pages (#75)
  • Modularize the API (#78)
  • Add default truncation to pipelines (#79)

Bug Fixes

  • Non intuitive behaviour of Tokenizer (#61)
  • [Python 3.9, Mac OS] Code hangs while building embedding index (#62)
  • embeddings.index Truncation RuntimeError: The size of tensor a (889) must match the size of tensor b (512) at non-singleton dimension 1 (#74)