Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MV only for HNSW #1020

Merged
merged 1 commit into from
Jan 14, 2025
Merged

Conversation

chasingegg
Copy link
Collaborator

issue: #1019
/kind feature

Copy link

codecov bot commented Jan 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.81%. Comparing base (3c46f4c) to head (c4e3d10).
Report is 290 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main    #1020       +/-   ##
=========================================
+ Coverage      0   73.81%   +73.81%     
=========================================
  Files         0       82       +82     
  Lines         0     7488     +7488     
=========================================
+ Hits          0     5527     +5527     
- Misses        0     1961     +1961     

see 82 files with indirect coverage changes

@@ -95,6 +95,8 @@ constexpr const char* JSON_ID_SET = "json_id_set";
constexpr const char* TRACE_ID = "trace_id";
constexpr const char* SPAN_ID = "span_id";
constexpr const char* TRACE_FLAGS = "trace_flags";
constexpr const char* SCALAR_INFO = "scalar_info";
constexpr const char* MV_ONLY_ENABLED = "mv_only_enabled";
Copy link
Collaborator

@PwzXxm PwzXxm Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on SCALAR_INFO empty or not to indicate whether we enable mv_only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a hidden config partition_key_isolation in cardinal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is only used in UT, could remove it

}

Status
Serialize(BinarySet& binset) const override {
if (index == nullptr) {
if (indexes.empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap 173-180 to an Empty() func?

return Status::empty_index;
}
for (auto& index : indexes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe const here?


try {
MemoryIOWriter writer;
faiss::write_index(index.get(), &writer);
if (indexes.size() > 1) {
Copy link
Collaborator

@PwzXxm PwzXxm Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only diffs are write_mv and writeHeader?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

if (this->index == nullptr) {
LOG_KNOWHERE_ERROR_ << "Can not add data to an empty index.";
return Status::empty_index;
// std::shared_ptr<faiss::Index> index;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete?

}
size_t first_valid_index = bitset.get_first_valid_index();
auto it = std::lower_bound(index_rows_sum.begin(), index_rows_sum.end(),
label_to_internal_offset[first_valid_index] + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need +1 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, offset starts from 0 but rows starts from 1

@@ -1220,10 +1535,16 @@ class BaseFaissRegularIndexHNSWNode : public BaseFaissRegularIndexNode {
expected<std::vector<IndexNode::IteratorPtr>>
AnnIterator(const DataSetPtr dataset, std::unique_ptr<Config> cfg, const BitsetView& bitset,
bool use_knowhere_search_pool) const override {
if (index == nullptr) {
LOG_KNOWHERE_WARNING_ << "creating iterator on empty index";
if (this->indexes.empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

index_rows_sum.resize(tmp_combined_scalar_ids.size() + 1);
labels.resize(tmp_combined_scalar_ids.size());
indexes.resize(tmp_combined_scalar_ids.size());
for (auto i = 0; i < tmp_combined_scalar_ids.size(); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is similar in PQ/PRQ, Train/TrainInternal, is it possible to abstract the common?

@@ -319,6 +428,48 @@ static constexpr DataFormatEnum datatype_v = DataType2EnumHelper<T>::value;

namespace {

bool
convert_rows_to_fp32(const void* const __restrict src_in, float* const __restrict dst,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add __restrict to pointers in order to help the compiler

indexes_to_reconstruct_from[0]->reconstruct(id, result);
} else {
auto it =
std::lower_bound(index_rows_sum.begin(), index_rows_sum.end(), label_to_internal_offset[id] + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any checks for it pointing to lower_bound() returning empty results?


inline BitsetViewIDSelector(BitsetView bitset_view, const size_t offset = 0)
: bitset_view{bitset_view}, id_offset(offset) {
inline BitsetViewIDSelector(BitsetView bitset_view, const size_t offset = 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would to it differently. I'd add a new class which is derived from faiss::IDSelector, which is a special case of what you wrote, which includes out_id_mapping field that is guaranteed to be non-null. I'd keep BitsetViewIDSelector as it was. And I'd modify faiss::IDSelector* id_selector = (bitset.empty()) ? nullptr : &bw_idselector; line (Search() call) in faiss_hnsw.cc correspondingly to allow this new BitsetViewWithMappingIDSelector to be used

labels.resize(tmp_combined_scalar_ids.size());
indexes.resize(tmp_combined_scalar_ids.size());
const void* data = dataset->GetTensor();
for (auto i = 0; i < tmp_combined_scalar_ids.size(); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this block the same in multiple functions down below?

@@ -97,6 +97,15 @@ InvertedLists* read_InvertedLists(IOReader* reader, int io_flags = 0);
// for backward compatibility
Index *read_index_nm(IOReader *f, int io_flags = 0);
void write_index_nm(const Index* idx, IOWriter* writer);

// additional helper function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment that these are knowhere-specific functions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mergify mergify bot removed the ci-passed label Jan 13, 2025
@chasingegg chasingegg force-pushed the support-mv branch 5 times, most recently from b45f69b to d416ba2 Compare January 13, 2025 10:07
@mergify mergify bot added the ci-passed label Jan 13, 2025
@alexanderguzhva
Copy link
Collaborator

/lgtm overall

Signed-off-by: chasingegg <[email protected]>
@PwzXxm
Copy link
Collaborator

PwzXxm commented Jan 14, 2025

/lgtm
/approve

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chasingegg, PwzXxm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 2a2f5ef into zilliztech:main Jan 14, 2025
13 of 14 checks passed
@chasingegg chasingegg deleted the support-mv branch January 14, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants