Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Relate for PreparedGeometry #1317

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

michaelkirk
Copy link
Member

@michaelkirk michaelkirk commented Feb 27, 2025

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Primarily: We cache the bounding_rect computation on the PreparedGeometry, which gets us most of our perf gain.

Secondarily: I realized we can skip building the geometry graph if the bbox check fails (which is a modest speedup for both prepared and unprepared geometries).

$ cargo bench --bench prepared_geometry -- --baseline=main
   Compiling geo v0.29.3 (/Users/mkirk/src/georust/geo/geo)
   Compiling jts-test-runner v0.1.0 (/Users/mkirk/src/georust/geo/jts-test-runner)
    Finished `bench` profile [optimized] target(s) in 6.35s
     Running benches/prepared_geometry.rs (target/release/deps/prepared_geometry-b11c776cecdacd91)
Gnuplot not found, using plotters backend
relate prepared polygons
                        time:   [25.355 ms 25.456 ms 25.562 ms]
                        change: [-47.861% -47.612% -47.365%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking relate unprepared polygons: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 82.0s, or reduce sample count to 10.
relate unprepared polygons
                        time:   [761.98 ms 764.63 ms 767.56 ms]
                        change: [-6.1618% -5.7402% -5.3188%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

== Bench

$ cargo bench --bench prepared_geometry -- --baseline=main
   Compiling geo v0.29.3 (/Users/mkirk/src/georust/geo/geo)
   Compiling jts-test-runner v0.1.0 (/Users/mkirk/src/georust/geo/jts-test-runner)
    Finished `bench` profile [optimized] target(s) in 4.65s
     Running benches/prepared_geometry.rs (target/release/deps/prepared_geometry-b11c776cecdacd91)
Gnuplot not found, using plotters backend
relate prepared polygons
                        time:   [32.210 ms 32.290 ms 32.383 ms]
                        change: [-33.711% -33.548% -33.350%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking relate unprepared polygons: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 80.0s, or reduce sample count to 10.
relate unprepared polygons
                        time:   [799.79 ms 801.23 ms 802.67 ms]
                        change: [-1.5726% -1.2283% -0.9119%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
…ameters

Is this what progress looks like? 😭
== Bench

NOTE: this is vs. *main*, so the prepared_geometry bench was already -33.548% faster vs main.

$ cargo bench --bench prepared_geometry -- --baseline=main
   Compiling geo v0.29.3 (/Users/mkirk/src/georust/geo/geo)
   Compiling jts-test-runner v0.1.0 (/Users/mkirk/src/georust/geo/jts-test-runner)
    Finished `bench` profile [optimized] target(s) in 6.35s
     Running benches/prepared_geometry.rs (target/release/deps/prepared_geometry-b11c776cecdacd91)
Gnuplot not found, using plotters backend
relate prepared polygons
                        time:   [25.355 ms 25.456 ms 25.562 ms]
                        change: [-47.861% -47.612% -47.365%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking relate unprepared polygons: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 82.0s, or reduce sample count to 10.
relate unprepared polygons
                        time:   [761.98 ms 764.63 ms 767.56 ms]
                        change: [-6.1618% -5.7402% -5.3188%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
where
Self: Sized,
{
RelateOperation::new(self, other).compute_intersection_matrix()
Copy link
Member Author

@michaelkirk michaelkirk Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the change - rather than passing in the geometry_graph, we pass in the thing that builds the geometry_graph (which could be a Geometry or a PreparedGeometry).

@michaelkirk
Copy link
Member Author

@gauteh you might be interested in trying this out.

use crate::Intersects;
match (
self.graph_a.geometry().bounding_rect(),
self.graph_b.geometry().bounding_rect(),
self.geometry_a.bounding_rect().into(),
Copy link
Member Author

@michaelkirk michaelkirk Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously: self.graph_a.geometry() is the (unprepared) GeometryCow, so we'd do a bbox calculation here.

Now self.geometry_a can be a PreparedGeometry, which has a cached bounding_rect. Or if you call Relate with an unprepared geometry, it will continue to do the old dynamic bbox calculation.

Copy link
Contributor

@dabreegster dabreegster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an impressive speedup, a clear win! A few questions about the code, likely just because this is an unfamiliar codebase

@@ -108,11 +108,11 @@ impl<F: GeoFloat> LabeledEdgeEndBundleStar<F> {
let left_position = label.position(geom_index, Direction::Left);
let right_position = label.position(geom_index, Direction::Right);

if let Some(right_position) = right_position {
if let Some(_right_position) = right_position {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? I thought leading underscore was convention for "allow this unused variable", but it gets used below. Is this just because the debug assertion gets compiled out sometimes and you get an error otherwise?

Copy link
Member Author

@michaelkirk michaelkirk Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Specifically it's only used in a debug assert, which gets compiled out in release builds, thus producing an "unused" warning.

let r_tree = geometry_graph.build_tree();

let envelope = r_tree.root().envelope();
let bounding_rect: Option<Rect<F>> = if envelope.lower().x > envelope.upper().x {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or lower().y > upper().y? Why is only one of these checks sufficient? Or is this just looking for the case when envelope() returns lower == upper?

Copy link
Member Author

@michaelkirk michaelkirk Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This definitely warrants a comment in the code, I'll add something like this:

rstar::AABB envelope implementation has a lower point: (min_x, min_y) and an upper point (max_x, max_y).

But rstar needs to be able to represent an empty envelopes (e.g. the envelope of a POLYGON EMPTY). Instead of doing something explicit like Option<Envelope>, the rstar implementation returns an "invalid" envelope where lower has the max value and upper has the min values.

   fn new_empty() -> Self {
        let max = P::Scalar::max_value();
        let min = P::Scalar::min_value();
        Self {
            lower: P::from_value(max),
            upper: P::from_value(min),
        }
    }

This empty envelope represents the only time this invariant is broken, so if one of the dimensions (x) is invalid, the other is too. It would be sufficient to check either variable. We could check both, but it would be redundant in all cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a fn is_empty() if many places check for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only place we're checking AFAIK. envelope.is_empty() seems like a nice addition for rstar. I'm not super active in that repo, but I'll add a comment for now, and then propose this change upstream.

Copy link
Member Author

@michaelkirk michaelkirk Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've come up with a clearer way to write this. Take another look @dabreegster eee42d7

I still think the is_empty might be worth adding, but it seems less urgent.

Updated: opened PR here: georust/rstar#190

///
/// # Params
///
/// `idx`: 0 or 1, designating A or B (respectively) in the role this geometry plays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing it's overkill to make this more typesafe with an enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this is a reflection of the reference implementation — a named enum would definitely be less cryptic.

I think we could do it, but it would be kind of a broad change - ultimately it's used throughout the geometry graph code as an index of various [T; 2].

Since it doesn't represent new complexity in this PR (I've only added documentation here), I'd prefer not to do that refactor in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not suggesting a change. Thanks for the explanation

Copy link
Contributor

@dabreegster dabreegster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants