-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MV only for HNSW #1020
Support MV only for HNSW #1020
Conversation
96d5ecc
to
b3f6a57
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
=========================================
+ Coverage 0 73.81% +73.81%
=========================================
Files 0 82 +82
Lines 0 7488 +7488
=========================================
+ Hits 0 5527 +5527
- Misses 0 1961 +1961 |
include/knowhere/comp/index_param.h
Outdated
@@ -95,6 +95,8 @@ constexpr const char* JSON_ID_SET = "json_id_set"; | |||
constexpr const char* TRACE_ID = "trace_id"; | |||
constexpr const char* SPAN_ID = "span_id"; | |||
constexpr const char* TRACE_FLAGS = "trace_flags"; | |||
constexpr const char* SCALAR_INFO = "scalar_info"; | |||
constexpr const char* MV_ONLY_ENABLED = "mv_only_enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rely on SCALAR_INFO empty or not to indicate whether we enable mv_only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a hidden config partition_key_isolation
in cardinal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config is only used in UT, could remove it
src/index/hnsw/faiss_hnsw.cc
Outdated
} | ||
|
||
Status | ||
Serialize(BinarySet& binset) const override { | ||
if (index == nullptr) { | ||
if (indexes.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe wrap 173-180 to an Empty()
func?
src/index/hnsw/faiss_hnsw.cc
Outdated
return Status::empty_index; | ||
} | ||
for (auto& index : indexes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe const
here?
|
||
try { | ||
MemoryIOWriter writer; | ||
faiss::write_index(index.get(), &writer); | ||
if (indexes.size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only diffs are write_mv
and writeHeader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
src/index/hnsw/faiss_hnsw.cc
Outdated
if (this->index == nullptr) { | ||
LOG_KNOWHERE_ERROR_ << "Can not add data to an empty index."; | ||
return Status::empty_index; | ||
// std::shared_ptr<faiss::Index> index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete?
} | ||
size_t first_valid_index = bitset.get_first_valid_index(); | ||
auto it = std::lower_bound(index_rows_sum.begin(), index_rows_sum.end(), | ||
label_to_internal_offset[first_valid_index] + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need +1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, offset starts from 0 but rows starts from 1
src/index/hnsw/faiss_hnsw.cc
Outdated
@@ -1220,10 +1535,16 @@ class BaseFaissRegularIndexHNSWNode : public BaseFaissRegularIndexNode { | |||
expected<std::vector<IndexNode::IteratorPtr>> | |||
AnnIterator(const DataSetPtr dataset, std::unique_ptr<Config> cfg, const BitsetView& bitset, | |||
bool use_knowhere_search_pool) const override { | |||
if (index == nullptr) { | |||
LOG_KNOWHERE_WARNING_ << "creating iterator on empty index"; | |||
if (this->indexes.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
src/index/hnsw/faiss_hnsw.cc
Outdated
index_rows_sum.resize(tmp_combined_scalar_ids.size() + 1); | ||
labels.resize(tmp_combined_scalar_ids.size()); | ||
indexes.resize(tmp_combined_scalar_ids.size()); | ||
for (auto i = 0; i < tmp_combined_scalar_ids.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is similar in PQ/PRQ, Train/TrainInternal, is it possible to abstract the common?
@@ -319,6 +428,48 @@ static constexpr DataFormatEnum datatype_v = DataType2EnumHelper<T>::value; | |||
|
|||
namespace { | |||
|
|||
bool | |||
convert_rows_to_fp32(const void* const __restrict src_in, float* const __restrict dst, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add __restrict
to pointers in order to help the compiler
indexes_to_reconstruct_from[0]->reconstruct(id, result); | ||
} else { | ||
auto it = | ||
std::lower_bound(index_rows_sum.begin(), index_rows_sum.end(), label_to_internal_offset[id] + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any checks for it
pointing to lower_bound()
returning empty results?
|
||
inline BitsetViewIDSelector(BitsetView bitset_view, const size_t offset = 0) | ||
: bitset_view{bitset_view}, id_offset(offset) { | ||
inline BitsetViewIDSelector(BitsetView bitset_view, const size_t offset = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would to it differently. I'd add a new class which is derived from faiss::IDSelector
, which is a special case of what you wrote, which includes out_id_mapping
field that is guaranteed to be non-null
. I'd keep BitsetViewIDSelector
as it was. And I'd modify faiss::IDSelector* id_selector = (bitset.empty()) ? nullptr : &bw_idselector;
line (Search()
call) in faiss_hnsw.cc
correspondingly to allow this new BitsetViewWithMappingIDSelector
to be used
src/index/hnsw/faiss_hnsw.cc
Outdated
labels.resize(tmp_combined_scalar_ids.size()); | ||
indexes.resize(tmp_combined_scalar_ids.size()); | ||
const void* data = dataset->GetTensor(); | ||
for (auto i = 0; i < tmp_combined_scalar_ids.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this block the same in multiple functions down below?
thirdparty/faiss/faiss/index_io.h
Outdated
@@ -97,6 +97,15 @@ InvertedLists* read_InvertedLists(IOReader* reader, int io_flags = 0); | |||
// for backward compatibility | |||
Index *read_index_nm(IOReader *f, int io_flags = 0); | |||
void write_index_nm(const Index* idx, IOWriter* writer); | |||
|
|||
// additional helper function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment that these are knowhere
-specific functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
7bcfa6a
to
7253ece
Compare
b45f69b
to
d416ba2
Compare
d416ba2
to
1f4f4f7
Compare
1f4f4f7
to
0fdff8d
Compare
/lgtm overall |
0fdff8d
to
6dec4cf
Compare
6dec4cf
to
c41622c
Compare
Signed-off-by: chasingegg <[email protected]>
c41622c
to
c4e3d10
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chasingegg, PwzXxm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
issue: #1019
/kind feature