Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zmccormick7 authored Sep 19, 2024
1 parent 7861e0a commit 7cb1491
Showing 1 changed file with 18 additions and 20 deletions.
38 changes: 18 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,27 @@
# minDB - an extremely memory-efficient vector database
Existing open source vector databases are built on HNSW indexes that must be held entirely in memory to be used. This uses an extremely large amount of memory, which severely limits the sizes of vector DBs that can be used locally, and creates very high costs for cloud deployments.
Most vector databases are built on HNSW indexes that must be held entirely in memory to be used. This uses an extremely large amount of memory, which severely limits the sizes of vector DBs that can be used locally, and creates very high costs for cloud deployments.

It’s possible to build a vector database with extremely low memory requirements that still has high recall and low latency. The key is to use a highly compressed search index, combined with reranking from disk, as demonstrated in the [Zoom](https://arxiv.org/abs/1809.04067) paper. This project implements the core technique introduced in that paper. We also implement a novel adaptation of Faiss's two-level k-means clustering algorithm that only requires a small subset of vectors to be held in memory at an given point.

With minDB, you can index and query 100M 768d vectors with peak memory usage of around 3GB. With an in-memory vector DB, you would need ~340GB of RAM. This means you could easily index and query all of Wikipedia on an average Macbook.

**Note:** This is currently in a beta phase, and has not been fully tested in a production environment. It is possible there are bugs or edge cases that have not been tested, or additional limitations that are not listed here. There may also be breaking changes in the future.
**Disclaimer:** minDB has not been fully tested in a production environment. It is possible there are bugs or edge cases that have not been tested, or additional limitations that are not listed here. There may also be breaking changes in the future.

## Performance evaluation

To evaluate the performance of minDB, we compared it with a commonly used HNSW-based vector database, [Chroma](https://github.com/chroma-core/chroma). We used the FIQA-2018 dataset from the BEIR datasets library, found [here](https://github.com/beir-cellar/beir?tab=readme-ov-file). This dataset has 57,638 text chunks in the corpus, and 648 test queries. Embeddings were calculated for each chunk and query in the dataset using Cohere's embed-multilingual-v2.0 model, with a vector dimension of 768. We then measured recall (20@20), mean latency, and memory usage for both minDB and Chroma.

| | minDB | Chroma |
|----------------|------------|-------------|
| Recall | 0.995 | 0.923 |
| Latency | 5.04 ms | 3.95 ms |
| Memory (RAM) | 5.82 MB | 175.9 MB |

As you can see in the chart above, minDB achieves a much higher recall, while using ~30x less memory. This comes at the expense of slightly higher latency, but this difference is going to be immaterial for most RAG applications.

Recall and latency are measured using a `top_k` of 20. For minDB, we used a `preliminary_top_k` of 200. Memory usage for minDB is the size of the Faiss index. ChromaDB uses an HNSW index. The memory usage, in bytes per vector, for an HNSW index is `(d * 4 + M * 2 * 4)` where `d` is the dimensionality of the indexed vectors and `M` is the number of edges per node in the constructed graph. Chroma uses 16 for `M`, and the vector dimension used in this example is 768. The dataset used in this example has 57,638 vectors, giving a result of `(768 * 4 + 16 * 2 * 4) * 57638`.

The full code used to calculate these numbers is available in [this notebook](https://github.com/D-Star-AI/minDB/blob/main/eval/minDB_performance_eval.ipynb).

## Architecture overview
minDB uses a two-step process to perform approximate nearest neighbors search. First, a highly compressed Faiss index is searched to find the `preliminary_top_k` (set to 500 by default) results. Then the full uncompressed vectors for these results are retrieved from a key-value store on disk, and a k-nearest neighbors search is performed on these vectors to arrive at the `final_top_k` results.
Expand Down Expand Up @@ -45,24 +61,6 @@ You can also learn more about FastAPI [here](https://fastapi.tiangolo.com).
- One of the main dependencies, Faiss, doesn't play nice with Apple M1/M2 chips. You may be able to get it to work by building it from source, but we haven't successfully done so yet.
- We haven't tested it on datasets larger than 35M vectors yet. It should still work well up to 100-200M vectors, but beyond that performance may start to deteriorate.

## Eval

To evaluate the performance of minDB, we compared it with a commonly used HNSW-based vector database, ChromaDB. We used the FIQA-2018 dataset from the BEIR datasets library, found [here](https://github.com/beir-cellar/beir?tab=readme-ov-file). This dataset has 57,638 text chunks in the corpus, and 648 test queries. Embeddings were calculated on the dataset using Cohere's embed-multilingual-v2.0 model, with a vector dimension of 768.

| | minDB | ChromaDB |
|----------------|------------|-------------|
| recall | 0.995 | 0.923 |
| latency | 5.04 ms | 3.95 ms |
| memory (RAM) | 5.82 MB | 175.9 MB |

Recall and latency are measured using a `top_k` of 20. For minDB, we used a `preliminary_top_k` of 200.

Memory usage for minDB is the size of the Faiss index.

ChromaDB uses an HNSW index. The memory usage, in bytes per vector, for an HNSW index is `(d * 4 + M * 2 * 4)` where `d` is the dimensionality of the indexed vectors and `M` is the number of edges per node in the constructed graph. Chroma uses 16 for `M`, and the vector dimension used in this example is 768. The dataset used in this example has 57,638 vectors, giving a result of `(768 * 4 + 16 * 2 * 4) * 57638`.

The eval numbers listed above came from [this notebook](https://github.com/D-Star-AI/minDB/blob/main/eval/minDB_performance_eval.ipynb)

## Additional documentation
- [Tunable parameters](https://github.com/D-Star-AI/minDB/wiki/Tunable-parameters)
- [Contributing](https://github.com/D-Star-AI/minDB/wiki/Contributing)
Expand Down

0 comments on commit 7cb1491

Please sign in to comment.