Add vector search documentation #9135

kolchfa-aws · 2025-01-29T16:05:31Z

Adds a vector search section

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <[email protected]>

github-actions · 2025-01-29T16:05:44Z

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

naveentatikonda · 2025-01-29T18:45:00Z

_vector-search/vector-search-techniques/index.md

+|:---|:---|:---|:---|:---|
+|  Max dimensions |  16,000  |  16,000 |  16,000 |  16,000 |
+|  Filter |  Post-filter |  Post-filter |  Post-filter |  Filter during search |
+|  Training required |  No |  No |  Yes |  No |


Faiss HNSW with PQ also requires training

naveentatikonda · 2025-01-29T18:47:25Z

_vector-search/optimizing-performance/knn-vector-quantization.md

+redirect_from:
+  - /search-plugins/knn/knn-vector-quantization/
+outside_cards:
+  - heading: "Byte vectors"


Need to add a card for binary vectors along with byte vectors
https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector#binary-vectors

@naveentatikonda Thanks! I addressed both comments. Could you review this commit a5e8b8d.

@kolchfa-aws changes looks good. Thanks for making those changes.

For binary vectors, we also need to add memory estimation. Here is the formula for HNSW
1.1 * (dimension / 8 + 8 * M) bytes/vector

For IVF, I guess it is 1.1 * (((dimension / 8) * num_vectors) + (nlist * dimension / 8)). @jmazanec15 can you pls confirm ?

Yeah that looks good to me

Signed-off-by: Fanit Kolchina <[email protected]>

jmazanec15 · 2025-02-10T19:57:53Z

Hi @kolchfa-aws, this looks awesome!

In general, I think we should start moving the more low-level/expert details (like quantization and method configuration), out of vector search section and into detailed field reference section. Here is some high level feedback:

Itd be good to showcase/highlight some high level features on the vector search splash page (https://kolchfa-aws.github.io/vector-search). In particular: filtering, multi-vector per document support (nested), automatic embedding generation, low-memory search, and hybrid search
For vector search techniques, can we add sections on sparse vectors and hybrid search?
Can we add knn query type in query dsl?
Can we have a page for space types and only maintain one table - possibly in field reference? Also, can we move engine/method details into field reference and each engine have its own detailed section in mapping reference.
For examples/tutorials, can we use basic mapping instead of complex mapping:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2"
      }
    }
  }
}

In performance tuning, we can mention picking a specific engine or specifying overriding method parameters for expert level fine tuning and point to reference docs.
6. For quantization, can we put low level details into an expert section? Instead, can we just mention in performance tuning for memory optimization, we use quantization to achieve 32x, 16x, 8x, 4x, 2x compression_levels. I think we can even renaming the disk-based section, memory optimized vector search. This is example mapping:

PUT my-vector-index
{
  "settings" : {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector_field": {
        "type": "knn_vector",
        "dimension": 8,
        "space_type": "innerproduct",
        "data_type": "float",
        "mode": "on_disk",
        "compression_level": "16x"
      }
    }
  }
}

Quantization is a bit ugly for users to have to understand. So I think its better to belong in detailed field reference. We can say, for further fine tuning of the quantization methods, see field reference.
7. Can we move SIMD optimization section to performance tuning: https://kolchfa-aws.github.io/vector-search/creating-vector-index/vector-field/#simd-optimization-for-the-faiss-engine? End users dont really need to know about this when reading about vector data types
8. In optimizing vector search performance, can we add page on cluster sizing? Basically, this should point to memory evaluation formulas and picking node configurations.

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws added 9 commits January 15, 2025 13:32

Add vector database section

4331ce1

Signed-off-by: Fanit Kolchina <[email protected]>

More restructuring

db9e95e

Signed-off-by: Fanit Kolchina <[email protected]>

Layout update

3d93fc8

Signed-off-by: Fanit Kolchina <[email protected]>

Add cards to search topics

47b235f

Signed-off-by: Fanit Kolchina <[email protected]>

More restructuring

2ac67d9

Signed-off-by: Fanit Kolchina <[email protected]>

Add images and more rewrites

99ac17a

Signed-off-by: Fanit Kolchina <[email protected]>

Update settings

27107c8

Signed-off-by: Fanit Kolchina <[email protected]>

Formatting update

f009e68

Signed-off-by: Fanit Kolchina <[email protected]>

Resolve merge conflicts

8779a7c

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws requested review from Naarcha-AWS, AMoo-Miki, natebower, dlvenable and epugh as code owners January 29, 2025 16:05

github-actions bot assigned kolchfa-aws Jan 29, 2025

naveentatikonda reviewed Jan 29, 2025

View reviewed changes

Add more explanations to getting started

6cdb7b0

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws added 6 commits February 11, 2025 12:33

Change about page and remove k-NN terminology

f922fae

Signed-off-by: Fanit Kolchina <[email protected]>

unify terminology

1900e75

Signed-off-by: Fanit Kolchina <[email protected]>

Review comments

a5e8b8d

Signed-off-by: Fanit Kolchina <[email protected]>

More review comments

3f5301a

Signed-off-by: Fanit Kolchina <[email protected]>

Resolve merge conflicts

18b40af

Signed-off-by: Fanit Kolchina <[email protected]>

Resolve merge conflicts

8b8d831

Signed-off-by: Fanit Kolchina <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector search documentation #9135

Add vector search documentation #9135

kolchfa-aws commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

naveentatikonda Jan 29, 2025

naveentatikonda Jan 29, 2025

kolchfa-aws Feb 11, 2025

naveentatikonda Feb 12, 2025

jmazanec15 Feb 12, 2025

jmazanec15 commented Feb 10, 2025

Add vector search documentation #9135

Are you sure you want to change the base?

Add vector search documentation #9135

Conversation

kolchfa-aws commented Jan 29, 2025

Checklist

github-actions bot commented Jan 29, 2025

naveentatikonda Jan 29, 2025

Choose a reason for hiding this comment

naveentatikonda Jan 29, 2025

Choose a reason for hiding this comment

kolchfa-aws Feb 11, 2025

Choose a reason for hiding this comment

naveentatikonda Feb 12, 2025

Choose a reason for hiding this comment

jmazanec15 Feb 12, 2025

Choose a reason for hiding this comment

jmazanec15 commented Feb 10, 2025