Documentation updates

neuml · Nov 24, 2024 · 0f916d3 · 0f916d3
1 parent ff4d4da
commit 0f916d3
Show file tree

Hide file tree

Showing 5 changed files with 20 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -32,9 +32,11 @@ txtai is an all-in-one embeddings database for semantic search, LLM orchestratio
 ![architecture](https://raw.githubusercontent.com/neuml/txtai/master/docs/images/architecture.png#gh-light-mode-only)
 ![architecture](https://raw.githubusercontent.com/neuml/txtai/master/docs/images/architecture-dark.png#gh-dark-mode-only)
 
-Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, graph analysis and more.
+Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases.
 
-Embeddings databases can stand on their own and/or serve as a powerful knowledge source for large language model (LLM) prompts. Build autonomous agents, retrieval augmented generation (RAG) processes and multi-model workflows.
+This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.
+
+Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more.
 
 Summary of txtai features:
 
@@ -44,6 +46,7 @@ Summary of txtai features:
 - ↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.
 - 🤖 Agents that intelligently connect embeddings, pipelines, workflows and other agents together to autonomously solve complex problems
 - ⚙️ Build with Python or YAML. API bindings available for [JavaScript](https://github.com/neuml/txtai.js), [Java](https://github.com/neuml/txtai.java), [Rust](https://github.com/neuml/txtai.rs) and [Go](https://github.com/neuml/txtai.go).
+- 🔋 Batteries included with defaults to get up and running fast
 - ☁️ Run local or scale out with container orchestration
 
 txtai is built with Python 3.9+, [Hugging Face Transformers](https://github.com/huggingface/transformers), [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [FastAPI](https://github.com/tiangolo/fastapi). txtai is open-source under an Apache 2.0 license.
@@ -194,7 +197,7 @@ See the table below for the current recommended models. These models all allow c
 | [Image Captions](https://neuml.github.io/txtai/pipeline/image/caption)        | [BLIP](https://hf.co/Salesforce/blip-image-captioning-base)              |
 | [Labels - Zero Shot](https://neuml.github.io/txtai/pipeline/text/labels)      | [BART-Large-MNLI](https://hf.co/facebook/bart-large)                     |
 | [Labels - Fixed](https://neuml.github.io/txtai/pipeline/text/labels)          | Fine-tune with [training pipeline](https://neuml.github.io/txtai/pipeline/train/trainer)          |
-| [Large Language Model (LLM)](https://neuml.github.io/txtai/pipeline/text/llm) | [Mistral 7B OpenOrca](https://hf.co/Open-Orca/Mistral-7B-OpenOrca)       |
+| [Large Language Model (LLM)](https://neuml.github.io/txtai/pipeline/text/llm) | [Llama 3.1 Instruct](https://hf.co/meta-llama/Llama-3.1-8B-Instruct)     |
 | [Summarization](https://neuml.github.io/txtai/pipeline/text/summary)          | [DistilBART](https://hf.co/sshleifer/distilbart-cnn-12-6)                |
 | [Text-to-Speech](https://neuml.github.io/txtai/pipeline/audio/texttospeech)   | [ESPnet JETS](https://hf.co/NeuML/ljspeech-jets-onnx)                    |
 | [Transcription](https://neuml.github.io/txtai/pipeline/audio/transcription)   | [Whisper](https://hf.co/openai/whisper-base)                             | 

diff --git a/docs/faq.md b/docs/faq.md
@@ -66,8 +66,8 @@ import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from txtai import LLM
 
-# Load Mistral-7B-OpenOrca
-path = "Open-Orca/Mistral-7B-OpenOrca"
+# Load Phi 3.5-mini
+path = "microsoft/Phi-3.5-mini-instruct"
 model = AutoModelForCausalLM.from_pretrained(
   path,
   torch_dtype=torch.bfloat16,

diff --git a/docs/index.md b/docs/index.md
@@ -34,9 +34,11 @@ txtai is an all-in-one embeddings database for semantic search, LLM orchestratio
 ![architecture](images/architecture.png#gh-light-mode-only)
 ![architecture](images/architecture-dark.png#gh-dark-mode-only)
 
-Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, graph analysis and more.
+Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases.
 
-Embeddings databases can stand on their own and/or serve as a powerful knowledge source for large language model (LLM) prompts. Build autonomous agents, retrieval augmented generation (RAG) processes and multi-model workflows.
+This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.
+
+Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more.
 
 Summary of txtai features:
 
@@ -46,6 +48,7 @@ Summary of txtai features:
 - ↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.
 - 🤖 Agents that intelligently connect embeddings, pipelines, workflows and other agents together to autonomously solve complex problems
 - ⚙️ Build with Python or YAML. API bindings available for [JavaScript](https://github.com/neuml/txtai.js), [Java](https://github.com/neuml/txtai.java), [Rust](https://github.com/neuml/txtai.rs) and [Go](https://github.com/neuml/txtai.go).
+- 🔋 Batteries included with defaults to get up and running fast
 - ☁️ Run local or scale out with container orchestration
 
 txtai is built with Python 3.9+, [Hugging Face Transformers](https://github.com/huggingface/transformers), [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [FastAPI](https://github.com/tiangolo/fastapi). txtai is open-source under an Apache 2.0 license.

diff --git a/docs/models.md b/docs/models.md
@@ -10,7 +10,7 @@ See the table below for the current recommended models. These models all allow c
 | [Image Captions](./pipeline/image/caption.md)        | [BLIP](https://hf.co/Salesforce/blip-image-captioning-base)              |
 | [Labels - Zero Shot](./pipeline/text/labels.md)      | [BART-Large-MNLI](https://hf.co/facebook/bart-large)                     |
 | [Labels - Fixed](./pipeline/text/labels.md)          | Fine-tune with [training pipeline](./pipeline/train/trainer.md)          |
-| [Large Language Model (LLM)](./pipeline/text/llm.md) | [Mistral 7B OpenOrca](https://hf.co/Open-Orca/Mistral-7B-OpenOrca)       |
+| [Large Language Model (LLM)](./pipeline/text/llm.md) | [Llama 3.1 Instruct](https://hf.co/meta-llama/Llama-3.1-8B-Instruct)     |
 | [Summarization](./pipeline/text/summary.md)          | [DistilBART](https://hf.co/sshleifer/distilbart-cnn-12-6)                |
 | [Text-to-Speech](./pipeline/audio/texttospeech.md)   | [ESPnet JETS](https://hf.co/NeuML/ljspeech-jets-onnx)                    |
 | [Transcription](./pipeline/audio/transcription.md)   | [Whisper](https://hf.co/openai/whisper-base)                             | 

diff --git a/examples/01_Introducing_txtai.ipynb b/examples/01_Introducing_txtai.ipynb
@@ -35,17 +35,21 @@
         "\n",
         "[txtai](https://github.com/neuml/txtai) is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.\n",
         "\n",
-        "Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, retrieval augmented generation (RAG) and more.\n",
+        "Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases.\n",
         "\n",
-        "Embeddings databases can stand on their own and/or serve as a powerful knowledge source for large language model (LLM) prompts.\n",
+        "This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.\n",
+        "\n",
+        "Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more.\n",
         "\n",
         "The following is a summary of key features:\n",
         "\n",
         "- 🔎 Vector search with SQL, object storage, topic modeling, graph analysis and multimodal indexing\n",
         "- 📄 Create embeddings for text, documents, audio, images and video\n",
         "- 💡 Pipelines powered by language models that run LLM prompts, question-answering, labeling, transcription, translation, summarization and more\n",
         "- ↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.\n",
+        "- 🤖 Agents that intelligently connect embeddings, pipelines, workflows and other agents together to autonomously solve complex problems\n",
         "- ⚙️ Build with Python or YAML. API bindings available for [JavaScript](https://github.com/neuml/txtai.js), [Java](https://github.com/neuml/txtai.java), [Rust](https://github.com/neuml/txtai.rs) and [Go](https://github.com/neuml/txtai.go).\n",
+        "- 🔋 Batteries included with defaults to get up and running fast\n",
         "- ☁️ Run local or scale out with container orchestration\n",
         "\n",
         "txtai is built with Python 3.9+, [Hugging Face Transformers](https://github.com/huggingface/transformers), [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [FastAPI](https://github.com/tiangolo/fastapi). txtai is open-source under an Apache 2.0 license."