Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added within_unsorted and within_count #52

Merged
merged 4 commits into from
Feb 2, 2024

Conversation

abstractqqq
Copy link
Contributor

@abstractqqq abstractqqq commented Dec 26, 2023

In some scientific computing tasks, we only need the neighbors within radius (with no requirement on being ordered), and there is no need to sort them, or we only need the neighbor count.

I implemented within_unsorted and within_count to deal with these situations more efficiently.

To check:
run tests and bench

image

@abstractqqq
Copy link
Contributor Author

@mrhooray Sorry to mention you personally. I just want to make sure you see this PR.

Ok(evaluated.into_iter().map(Into::into).collect())
}

pub fn within_count<F>(&self, point: &[A], radius: A, distance: &F) -> Result<usize, ErrorKind>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @abstractqqq
within_count looks very similar to within_unsorted - do you think it's strictly needed?
It seems the same can be achieved via within_unsorted().len() or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right that it can be achieved like that. However, I don't think they compile to the same code, as the runtime has a difference. Doing into_iter will consume the heap and the extra step mapping HeapElement to a tuple still have small cost which will add up depending on heap size.

I am building an application in which one tree is built, and for each element in a column, I need to run a within_count query, and I care about every millisecond spent on this operation. As you can see in the screenshot above, when radius increases, the difference between the runtime gets larger.

From a UX point of view, having a within_count is also more convenient than doing within_unsorted().len().

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the difference comes from the final return, do you think it's worthwhile to extract L124-132/L144-152 to avoid future (unintended) divergence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I factored out the common part as a function 👍

@abstractqqq
Copy link
Contributor Author

@mrhooray

The lint failure is not a problem of this PR, but something in heap_element.rs. Could you please take a look? Thank you.

@mrhooray mrhooray merged commit 4f3755e into mrhooray:master Feb 2, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants