Indexing API for PG Vector and Postgres - more control over what is part of the hash from metadata #29624
carvana-holwerda
started this conversation in
Ideas
Replies: 1 comment
-
found a way to do this without this feature although it requires to keep a separate hash table of the full doc and do a check before sending to the indexing api |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked
Feature request
Having the ability to exclude specific keys and its value from the hashed document.
Motivation
The metadata can have many things within it.
In a contextual RAG scenario, the context is stored in the metadata and there could be keys and values that change just from trying to embed the doc again.
In my case it is context and keywords keys that I don't want to be part of the hash as they may change.
This is mostly because I use a LLM to generate the context and keywords and there might be ever so slightly of a difference in what the LLM returns, but the actual doc has not changed, just some of the generated metadata.
Proposal (If applicable)
Either a per index request or a global config for all index requests, the hash_metadata_key_exclusions shown below
result = index( documents, record_manager, vector_store, cleanup="incremental", source_id_key="source", hash_metadata_key_exclusions=["context","keywords"] )
Beta Was this translation helpful? Give feedback.
All reactions