-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix heap buffer overflow caused by prefetch. #459
Open
kishorenc
wants to merge
4
commits into
nmslib:master
Choose a base branch
from
typesense:fix-prefetch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that fix defeats the purpose of the prefetch, as it need to be done for the next element.
Is there any evidence this prefetch affects the program behavior?
If so, I guess we probably should come with a different fix, e.g. min(j+1, max_value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that a cache line is atleast 64 bytes, so does the extra
+1
really matter? What matters is that the loop is moving forward via thej
variable increments so we keep fetching ahead. We should be able to confirm this via a benchmark: I don't think there will be any degradation in performance.There are actually a couple of other places the same issue happens, which I just noticed:
(Edit: updated PR to cover these as well)
There is a buffer overflow here by accessing past the array length. Whether that causes a segfault will depend on hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the proper way to actually use a
_mm_prefetch
is to fetch a full 64-bytes ahead, for example how it's done here by fetching "8 doubles ahead":https://stackoverflow.com/questions/73012072/why-such-address-is-used-with-mm-prefetch
But this is quite tricky to implement and test because modern processors have gotten really good at prefetching without any hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 fetches the visited bit for the next item, which can be anywhere in the memory compared the previous element. Almost certain different cache and page and at least skylake's cpus were struggling without the prefetches, that is why it ended up in the code despite the obvious issue.
Yeah, it would be great if you could profile the performance on few different x64 platforms and maybe improve the prefetching performance and hopefully fix this out-of-bound issue that made people uneasy.
I think the proper way to test the performance is to do search on small dimensional data to focus on memory sparsity (I used d=4) and keep two implementations of the search function (with and without the change), so the performance would not depend on random memory layout changes in the data. That should allow measuring the performance within 1-2% accuracy.