Query-Adaptive KNN Index

Quake is a C++ library (with Python bindings) for dynamic, high‑performance approximate nearest neighbor (ANN) search. Its core operations—building a dynamic index, adaptive search, real‑time updates, and automatic maintenance—ensure high-throughput updates and low-latency queries without manual tuning.

Key Advantages

Dynamic Indexing:
Build, update and automatically maintain the index in real time without full rebuilds.
Adaptive Search:
Specify a recall target (e.g. 90% recall) and let Quake automatically choose the number of partitions to scan.
High Performance:
Leveraging multi‑threading, SIMD, and NUMA, Quake delivers both low latency and high throughput.
PyTorch Integration:
Directly work with PyTorch tensors for easy integration into machine learning workflows.

Key Limitations (for now)

Quake has the following limitations which will be addressed in future development.

Metrics: Only supports “l2” (Euclidean) and “ip” (inner product).
Data Types: Vectors must be float32; IDs must be int64.
CPU-only Search: Even if built with GPU, search is performed on the CPU.
Vectors Only: Currently we don't support storing or filtering on per-vector attributes.
Single Node: Currently supports only a single node.

Quick Setup

Clone the Repository:

git clone https://github.com/marius-team/quake.git
cd quake
git submodule update --init --recursive

Set Up the Conda Environment:

Create and activate the environment using the provided YAML file:
```
conda env create -f environments/ubuntu-latest/conda.yaml
conda activate quake-env
```
Install Quake:
```
pip install .
```

Using the Python API

Quake’s Python API provides a set of core operations. The following sections describe each operation step by step. For a complete working demonstration, please refer to the full example in examples/quickstart.py.

1. Building the Index

Operation: Create an index from your dataset.
Steps:
- Instantiate a QuakeIndex object.
- Configure IndexBuildParams (e.g., set the number of clusters and metric type).
- Call the build() method with your vectors and corresponding IDs.

Example:

import quake
import torch

vectors = torch.randn(10000, 128)
ids = torch.arange(10000)

index = quake.QuakeIndex()
build_params = quake.IndexBuildParams()
build_params.nlist = 1024  # Number of clusters
build_params.metric = "l2" # Use Euclidean distance
index.build(vectors, ids, build_params)

2. Searching the Index

Operation: Execute search queries.
Steps:
- Create a SearchParams instance (set the number of neighbors k, number of partitions to probe nprobe, or a recall target).
- Use the search() method with your query tensor.
- Retrieve neighbor IDs and distances.

Example:

search_params = quake.SearchParams()
search_params.k = 10
search_params.nprobe = 10
result = index.search(queries, search_params)

3. Updating the Index

Operation: Modify the index by removing and adding vectors.
Steps:
- Removal: Call remove() with the tensor of IDs to remove.
- Addition: Call add() with new vectors and their corresponding IDs.

Example:

remove_ids = torch.arange(100)
add_vectors = torch.randn(100, 128)
add_ids = torch.arange(10000, 10100) # IDs must be unique
index.remove(remove_ids)
index.add(add_vectors, add_ids)

4. Performing Maintenance

Operation: Trigger dynamic maintenance (e.g., partition splits or merges).
Steps:
- Simply call the maintenance() method.
- Inspect the returned timing information for details on splits/merges.
Example:
```
maintenance_info = index.maintenance()
```

Further Documentation

For detailed instructions and advanced usage, please see the documentation: http://marius-project.org/quake/

Contact

For questions or contributions, please open an issue or reach out to [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs		docs
environments/ubuntu-latest		environments/ubuntu-latest
examples		examples
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query-Adaptive KNN Index

Key Advantages

Key Limitations (for now)

Quick Setup

Using the Python API

1. Building the Index

2. Searching the Index

3. Updating the Index

4. Performing Maintenance

Further Documentation

Contact

About

Releases

Packages

Contributors 2

Languages

License

marius-team/quake

Folders and files

Latest commit

History

Repository files navigation

Query-Adaptive KNN Index

Key Advantages

Key Limitations (for now)

Quick Setup

Using the Python API

1. Building the Index

2. Searching the Index

3. Updating the Index

4. Performing Maintenance

Further Documentation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages