Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyknowhere - support dump and load index file #145

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

alwayslove2013
Copy link
Collaborator

@alwayslove2013 alwayslove2013 commented Oct 12, 2023

/kind feature

Issue: #154

@alwayslove2013 alwayslove2013 changed the title [pyknowhere] support dump and load index file pyknowhere - support dump and load index file Oct 12, 2023
@mergify
Copy link

mergify bot commented Oct 12, 2023

@alwayslove2013 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

auto binary_map = binset -> binary_map_;
std::ofstream outfile;
outfile.open(file_name, std::ios::out | std::ios::trunc);
for (auto it = binary_map.begin(); it != binary_map.end(); ++it) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check is the file was opened successfully via if (outfile.good()) {}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Load(knowhere::BinarySetPtr binset, const std::string& file_name) {
std::ifstream infile;
infile.open(file_name, std::ios::in);
size_t name_len;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check is the file was opened via if (infile.good()) {}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// serialization: name_length(size_t); name(char[]); binset_size(size_t); binset(uint8[]);
auto name = it->first;
outfile << name.size();
outfile << name;
Copy link
Collaborator

@alexanderguzhva alexanderguzhva Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name.size() is type size_t and is platform dependent, please write and read sizes as uint64_t

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

int64_t size;
infile >> size;

auto data = new uint8_t[size];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any checks if the size is too large, negative or making no sense? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negative checked.
how to check for too large size? or we can just wait for the program to report an exception?


auto data = new uint8_t[size];
infile.read(reinterpret_cast<char*>(data), size);
std::shared_ptr<uint8_t[]> data_ptr(data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::shared_ptr<> first, read data into it next

Copy link
Collaborator Author

@alwayslove2013 alwayslove2013 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this
auto data = new uint8_t[size];
std::shared_ptr<uint8_t[]> data_ptr(data);
infile.read(reinterpret_cast<char*>(data_ptr.get()), size);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.
Also, please note that this code needs C++17, check https://stackoverflow.com/questions/13061979/can-you-make-a-stdshared-ptr-manage-an-array-allocated-with-new-t , I don't know how this shared_ptr<uint8_t[]> will be used later.

}
// end with 0
outfile << 0;
outfile.close();
Copy link
Collaborator

@alexanderguzhva alexanderguzhva Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outfile.close() is not needed here. I would rather use outfile.flush();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


binset->Append(name, data_ptr, size);
}
infile.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infile.close(); is not needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@alexanderguzhva
Copy link
Collaborator

/lgtm

@alexanderguzhva
Copy link
Collaborator

It needs to be noted somewhere that reading/writing errors are not exposed

Signed-off-by: min.tian <[email protected]>

review

Signed-off-by: min.tian <[email protected]>

fix pre-commit

Signed-off-by: min.tian <[email protected]>
@alwayslove2013
Copy link
Collaborator Author

@alexanderguzhva The load and dump API changes will only appear in pyknowhere, and will not participate in the compilation of knowhere c++. How to accurately expose error messages to Python is still rough in the current version, and seems to need to be redesigned.

By the way, I have squashed all commits into one, and LGTM label has been removed. Please give another one~

@alexanderguzhva
Copy link
Collaborator

/lgtm

@liliu-z
Copy link
Collaborator

liliu-z commented Oct 17, 2023

@alwayslove2013 issue link plz

@alwayslove2013
Copy link
Collaborator Author

related issue: #154

@liliu-z
Copy link
Collaborator

liliu-z commented Oct 17, 2023

/approve

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alwayslove2013, liliu-z

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 95be2bb into zilliztech:main Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants