RPQ:Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search

Quick Links

Quick Tour
Baseline
Datasets
Installation

Quick Tour

Given a vector dataset $\mathcal{X}$, a query vector $\vec{x}_q$, graph-based Approximate Nearest Neighbor Search (ANNS) aims to build a proximity graph (PG) as an index of $\mathcal{X}$ and approximately return vectors with minimum distances to $\vec{x}_q$ by searching over the PG index. It has been widely recognized that graph-based ANNS is effective and efficiency, however, it suffers from the large-scale $\mathcal{X}$ because of the entire PG is too large to fit into the memory, e.g., a billion-scale dataset would cause more than 75 GB memory for a PG index. To solve this, Product Quantization (PQ) integrated graph-based ANNS is proposed to reduce the memory usage, by replacing a large PG with original vectors by the one with smaller compact codes of quantized vectors. Existing PQ methods do not consider the important routing features of PG, thus resulting in low-quality quantized vectors that significantly affect the ANNS's effectiveness. We present a general Routing-guided leaned Product Quantization (RPQ) for graph-based ANNS, which easily can be adaptive to existing popular PGs. Specifically, RPQ consists of (1) a differentiable quantizer used to make the standard discrete PQ differentiable to suit for back-propagation of end-to-end learning, (2) a sampling-based feature extractor used to extract neighborhood and routing features of a PG by using the quantized vectors, and (3) a multi-feature joint training module with two types of feature-aware losses to continuously optimize the the differentiable quantizer. As a result, the inherent features of a specific PG would be embedded into the learned PQ, thus generating high-quality quantized vectors that facilitate the graph-based ANNS's effectiveness. Moreover, we integrate our RPQ with existing PGs and the state-of-the-art DiskANN to improve their effectiveness for large-scale datasets. Comprehensive experimental studies on real-world datasets demonstrate RPQ's superiority.

Baseline

Product Quantization (PQ) : PQ [TPAMI'11] is a typical quantization algorithm that is widely used in large billion-scale indices [NeurIPS'19,ECCV'18].
Optimized Product Quantization (OPQ) : OPQ [TPAMI'14] allows jointly optimizes bit allocation and reconstruction error.
Link&Code : Link&Code[CVPR'18] design a new PQ-based method specifically for HNSW.
Catalyst : Catalyst[ICLR'19]utilize a compression network optimize quantization for similarity search, and DIM-RED [ICML'20] combine this method with graph-based ANNS methods.

Datasets

Dataset	# Dimension	# Base	# Query	# Train
BIGANN (link)	128	1,000,000	100,000	500,000
DEEP (link)	96	1,000,000	100,000	500,000
GIST (link))	960	1,000,000	10,000	500,000
UKBENCH(link)	128	1,000,000	200	97,907
SIFT (link)	128	1,000,000	10,000	100,000

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figures		figures
lib		lib
LICENSE		LICENSE
README.md		README.md
recompile.sh		recompile.sh
sift1m.py		sift1m.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RPQ:Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search

Quick Links

Quick Tour

Baseline

Datasets

Installation

About

Releases

Packages

Languages

License

Lsyhprum/BREWESS

Folders and files

Latest commit

History

Repository files navigation

RPQ:Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search

Quick Links

Quick Tour

Baseline

Datasets

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages