Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added within_unsorted and within_count #52

Merged
merged 4 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions benches/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,99 @@ fn bench_nearest_from_kdtree_with_1k_3d_points(b: &mut Bencher) {
}
b.iter(|| kdtree.nearest(&point.0, 8, &squared_euclidean).unwrap());
}

#[bench]
fn bench_within_2k_data_01_radius(b: &mut Bencher) {
let len = 2000usize;
let point = rand_data();
let mut points = vec![];
let mut kdtree = KdTree::with_capacity(3, 16);
for _ in 0..len {
points.push(rand_data());
}
for i in 0..points.len() {
kdtree.add(&points[i].0, points[i].1).unwrap();
}

b.iter(|| kdtree.within(&point.0, 0.1, &squared_euclidean).unwrap());
}

#[bench]
fn bench_within_2k_data_02_radius(b: &mut Bencher) {
let len = 2000usize;
let point = rand_data();
let mut points = vec![];
let mut kdtree = KdTree::with_capacity(3, 16);
for _ in 0..len {
points.push(rand_data());
}
for i in 0..points.len() {
kdtree.add(&points[i].0, points[i].1).unwrap();
}

b.iter(|| kdtree.within(&point.0, 0.2, &squared_euclidean).unwrap());
}

#[bench]
fn bench_within_unsorted_2k_data_01_radius(b: &mut Bencher) {
let len = 2000usize;
let point = rand_data();
let mut points = vec![];
let mut kdtree = KdTree::with_capacity(3, 16);
for _ in 0..len {
points.push(rand_data());
}
for i in 0..points.len() {
kdtree.add(&points[i].0, points[i].1).unwrap();
}

b.iter(|| kdtree.within_unsorted(&point.0, 0.1, &squared_euclidean).unwrap());
}

#[bench]
fn bench_within_unsorted_2k_data_02_radius(b: &mut Bencher) {
let len = 2000usize;
let point = rand_data();
let mut points = vec![];
let mut kdtree = KdTree::with_capacity(3, 16);
for _ in 0..len {
points.push(rand_data());
}
for i in 0..points.len() {
kdtree.add(&points[i].0, points[i].1).unwrap();
}

b.iter(|| kdtree.within_unsorted(&point.0, 0.2, &squared_euclidean).unwrap());
}

#[bench]
fn bench_within_count_2k_data_01_radius(b: &mut Bencher) {
let len = 2000usize;
let point = rand_data();
let mut points = vec![];
let mut kdtree = KdTree::with_capacity(3, 16);
for _ in 0..len {
points.push(rand_data());
}
for i in 0..points.len() {
kdtree.add(&points[i].0, points[i].1).unwrap();
}

b.iter(|| kdtree.within_count(&point.0, 0.1, &squared_euclidean).unwrap());
}

#[bench]
fn bench_within_count_2k_data_02_radius(b: &mut Bencher) {
let len = 2000usize;
let point = rand_data();
let mut points = vec![];
let mut kdtree = KdTree::with_capacity(3, 16);
for _ in 0..len {
points.push(rand_data());
}
for i in 0..points.len() {
kdtree.add(&points[i].0, points[i].1).unwrap();
}

b.iter(|| kdtree.within_count(&point.0, 0.2, &squared_euclidean).unwrap());
}
43 changes: 38 additions & 5 deletions src/kdtree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,11 @@ impl<A: Float + Zero + One, T: std::cmp::PartialEq, U: AsRef<[A]> + std::cmp::Pa
.collect())
}

pub fn within<F>(&self, point: &[A], radius: A, distance: &F) -> Result<Vec<(A, &T)>, ErrorKind>
#[inline(always)]
fn evaluated_heap<F>(&self, point: &[A], radius: A, distance: &F) -> BinaryHeap<HeapElement<A, &T>>
where
F: Fn(&[A], &[A]) -> A,
{
self.check_point(point)?;
if self.size == 0 {
return Ok(vec![]);
}
let mut pending = BinaryHeap::new();
let mut evaluated = BinaryHeap::<HeapElement<A, &T>>::new();
pending.push(HeapElement {
Expand All @@ -110,9 +107,45 @@ impl<A: Float + Zero + One, T: std::cmp::PartialEq, U: AsRef<[A]> + std::cmp::Pa
while !pending.is_empty() && (-pending.peek().unwrap().distance <= radius) {
self.nearest_step(point, self.size, radius, distance, &mut pending, &mut evaluated);
}
evaluated
}

pub fn within<F>(&self, point: &[A], radius: A, distance: &F) -> Result<Vec<(A, &T)>, ErrorKind>
where
F: Fn(&[A], &[A]) -> A,
{
self.check_point(point)?;
if self.size == 0 {
return Ok(vec![]);
}
let evaluated = self.evaluated_heap(point, radius, distance);
Ok(evaluated.into_sorted_vec().into_iter().map(Into::into).collect())
}

pub fn within_unsorted<F>(&self, point: &[A], radius: A, distance: &F) -> Result<Vec<(A, &T)>, ErrorKind>
where
F: Fn(&[A], &[A]) -> A,
{
self.check_point(point)?;
if self.size == 0 {
return Ok(vec![]);
}
let evaluated = self.evaluated_heap(point, radius, distance);
Ok(evaluated.into_iter().map(Into::into).collect())
}

pub fn within_count<F>(&self, point: &[A], radius: A, distance: &F) -> Result<usize, ErrorKind>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @abstractqqq
within_count looks very similar to within_unsorted - do you think it's strictly needed?
It seems the same can be achieved via within_unsorted().len() or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right that it can be achieved like that. However, I don't think they compile to the same code, as the runtime has a difference. Doing into_iter will consume the heap and the extra step mapping HeapElement to a tuple still have small cost which will add up depending on heap size.

I am building an application in which one tree is built, and for each element in a column, I need to run a within_count query, and I care about every millisecond spent on this operation. As you can see in the screenshot above, when radius increases, the difference between the runtime gets larger.

From a UX point of view, having a within_count is also more convenient than doing within_unsorted().len().

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the difference comes from the final return, do you think it's worthwhile to extract L124-132/L144-152 to avoid future (unintended) divergence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I factored out the common part as a function 👍

where
F: Fn(&[A], &[A]) -> A,
{
self.check_point(point)?;
if self.size == 0 {
return Ok(0);
}
let evaluated = self.evaluated_heap(point, radius, distance);
Ok(evaluated.len())
}

fn nearest_step<'b, F>(
&self,
point: &[A],
Expand Down
33 changes: 33 additions & 0 deletions tests/kdtree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,39 @@ fn it_works() {
vec![(0.0, &1), (2.0, &2), (2.0, &0)]
);

let unsorted1 = kdtree.within_unsorted(&POINT_A.0, 0.0, &squared_euclidean).unwrap();
let ans1 = vec![(0.0, &0)];
assert_eq!(unsorted1.len(), ans1.len());
assert_eq!(
kdtree.within_count(&POINT_A.0, 0.0, &squared_euclidean).unwrap(),
ans1.len()
);
for item in unsorted1 {
assert!(ans1.contains(&item));
}

let unsorted2 = kdtree.within_unsorted(&POINT_B.0, 1.0, &squared_euclidean).unwrap();
let ans2 = vec![(0.0, &1)];
assert_eq!(unsorted2.len(), ans2.len());
assert_eq!(
kdtree.within_count(&POINT_B.0, 1.0, &squared_euclidean).unwrap(),
ans2.len()
);
for item in unsorted2 {
assert!(ans2.contains(&item));
}

let unsorted3 = kdtree.within_unsorted(&POINT_B.0, 2.0, &squared_euclidean).unwrap();
let ans3 = vec![(0.0, &1), (2.0, &2), (2.0, &0)];
assert_eq!(unsorted3.len(), ans3.len());
assert_eq!(
kdtree.within_count(&POINT_B.0, 2.0, &squared_euclidean).unwrap(),
ans3.len()
);
for item in unsorted3 {
assert!(ans3.contains(&item));
}

assert_eq!(
kdtree
.iter_nearest(&POINT_A.0, &squared_euclidean)
Expand Down
Loading