feat: Adding query pipeline description

- Bumping latest release
amikos-tech · Jul 25, 2024 · c7d649f · c7d649f
1 parent 29d867b
commit c7d649f
Show file tree

Hide file tree

Showing 4 changed files with 77 additions and 2 deletions.
diff --git a/docs/assets/images/query-pipeline.png b/docs/assets/images/query-pipeline.png
diff --git a/docs/core/advanced/queries.md b/docs/core/advanced/queries.md
@@ -0,0 +1,67 @@
+# Chroma Queries
+
+This document attempts to capture how Chroma performs queries.
+
+## Basic concepts
+
+Chroma uses two types of indices (segments) which it queries over:
+
+- Metadata Index - this is stored in the `chroma.sqlite3` and queried with SQL. Chroma stores metadata for all
+  collections in this index.
+- Vector Index - this is the `HNSW` index stored under the UUID-named dirs under chroma persistent dir (or in memory for
+  EphemeralClient). One index per collection.
+
+### Metadata Index
+
+The metadata index consists of two tables:
+
+- `embeddings` - this is one-to-one mapping with the vectors stored in your collections
+- `embedding_metadata` - this is N+1 mapping to the vectors stored in your collections. Where `N` represents the number
+  of metadata fields per record and can vary for records. There is at least one entry in the `embedding_metadata` table
+  per embedding which represents the document.
+
+## Query Pipeline
+
+The query pipeline in Chroma:
+
+- Validation - the query is validated
+- Metadata pre-filter - Chroma plans a SQL query to select IDs to pass to KNN search. This step is skipped if `where`
+  or `where_document` are not provided.
+- KNN search in HNSW index - Similarity search with based on the embedded user query(ies). If metadata pre-filter
+  returned any IDs to search on, only those IDs are searched. The KNN search will also return actual vectors should
+  `included` contain `embeddings`.
+- Post-search query to fetch metadata - Fetch metadata for the IDs returned from the KNN search.
+- Result aggregation - Aggregate the results from the metadata and the KNN search and ensure all `included` fields are
+  populated.
+
+??? note "Query Pipeline?"
+
+    Why is it called a pipeline? Because each step in the query process depends on its predecessor's output.
+
+![Query Pipeline](../../assets/images/query-pipeline.png)
+
+### Validation
+
+The following validations are performed:
+
+- Validate `where` if present
+- Validate `where_document` if present
+- Ensure collection exists
+- Validate query embeddings dimensions match that of the collection
+
+### Metadata Pre-Filter
+
+TBD
+
+### KNN Search in HNSW Index
+
+TBD
+
+### Post-Search Query to Fetch Metadata
+
+TBD
+
+### Result Aggregation
+
+Result aggregation makes sure that results from the metadata fetch and the KNN search are fused together into the final
+result set.
diff --git a/docs/index.md b/docs/index.md
@@ -2,7 +2,7 @@
 
 This is a collection of small guides and recipes to help you get started with ChromaDB.
 
-Latest ChromaDB version: [0.5.4](https://github.com/chroma-core/chroma/releases/tag/0.5.4)
+Latest ChromaDB version: [0.5.5](https://github.com/chroma-core/chroma/releases/tag/0.5.5)
 
 **Latest Releases highlights:**
 

diff --git a/docs/running/deployment-patterns.md b/docs/running/deployment-patterns.md
@@ -1,4 +1,12 @@
 # Deployment Patterns
 
-In this section we'll cover a patterns of how to deploy GenAI applications using Chroma as a vector store.
+In this section we'll cover a patterns of how to deploy Chroma for your GenAI applications.
+
+## Embedded in your application
+
+
+## Standalone server
+
+
+###