-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added within_unsorted and within_count #52
Conversation
@mrhooray Sorry to mention you personally. I just want to make sure you see this PR. |
Ok(evaluated.into_iter().map(Into::into).collect()) | ||
} | ||
|
||
pub fn within_count<F>(&self, point: &[A], radius: A, distance: &F) -> Result<usize, ErrorKind> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @abstractqqq
within_count
looks very similar to within_unsorted
- do you think it's strictly needed?
It seems the same can be achieved via within_unsorted().len()
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right that it can be achieved like that. However, I don't think they compile to the same code, as the runtime has a difference. Doing into_iter will consume the heap and the extra step mapping HeapElement to a tuple still have small cost which will add up depending on heap size.
I am building an application in which one tree is built, and for each element in a column, I need to run a within_count query, and I care about every millisecond spent on this operation. As you can see in the screenshot above, when radius increases, the difference between the runtime gets larger.
From a UX point of view, having a within_count is also more convenient than doing within_unsorted().len().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the difference comes from the final return, do you think it's worthwhile to extract L124-132/L144-152 to avoid future (unintended) divergence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I factored out the common part as a function 👍
The lint failure is not a problem of this PR, but something in heap_element.rs. Could you please take a look? Thank you. |
In some scientific computing tasks, we only need the neighbors within radius (with no requirement on being ordered), and there is no need to sort them, or we only need the neighbor count.
I implemented within_unsorted and within_count to deal with these situations more efficiently.
To check:
run tests and bench