Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix heap buffer overflow caused by prefetch. #459

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions hnswlib/hnswalg.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,8 +260,8 @@ class HierarchicalNSW : public AlgorithmInterface<dist_t> {
tableint candidate_id = *(datal + j);
// if (candidate_id == 0) continue;
#ifdef USE_SSE
_mm_prefetch((char *) (visited_array + *(datal + j + 1)), _MM_HINT_T0);
_mm_prefetch(getDataByInternalId(*(datal + j + 1)), _MM_HINT_T0);
_mm_prefetch((char *) (visited_array + *(datal + j)), _MM_HINT_T0);
_mm_prefetch(getDataByInternalId(*(datal + j)), _MM_HINT_T0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that fix defeats the purpose of the prefetch, as it need to be done for the next element.
Is there any evidence this prefetch affects the program behavior?
If so, I guess we probably should come with a different fix, e.g. min(j+1, max_value)

Copy link
Contributor Author

@kishorenc kishorenc May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that a cache line is atleast 64 bytes, so does the extra +1 really matter? What matters is that the loop is moving forward via the j variable increments so we keep fetching ahead. We should be able to confirm this via a benchmark: I don't think there will be any degradation in performance.

There are actually a couple of other places the same issue happens, which I just noticed:

(Edit: updated PR to cover these as well)

Is there any evidence this prefetch affects the program behavior?

There is a buffer overflow here by accessing past the array length. Whether that causes a segfault will depend on hardware.

Copy link
Contributor Author

@kishorenc kishorenc May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the proper way to actually use a _mm_prefetch is to fetch a full 64-bytes ahead, for example how it's done here by fetching "8 doubles ahead":

https://stackoverflow.com/questions/73012072/why-such-address-is-used-with-mm-prefetch

But this is quite tricky to implement and test because modern processors have gotten really good at prefetching without any hints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mm_prefetch((char *) (visited_array + *(datal + j + 1)), _MM_HINT_T0);

+1 fetches the visited bit for the next item, which can be anywhere in the memory compared the previous element. Almost certain different cache and page and at least skylake's cpus were struggling without the prefetches, that is why it ended up in the code despite the obvious issue.

Yeah, it would be great if you could profile the performance on few different x64 platforms and maybe improve the prefetching performance and hopefully fix this out-of-bound issue that made people uneasy.

I think the proper way to test the performance is to do search on small dimensional data to focus on memory sparsity (I used d=4) and keep two implementations of the search function (with and without the change), so the performance would not depend on random memory layout changes in the data. That should allow measuring the performance within 1-2% accuracy.

#endif
if (visited_array[candidate_id] == visited_array_tag) continue;
visited_array[candidate_id] = visited_array_tag;
Expand Down