-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepared Geometries #803
Comments
This would be great for us at cognite. We have an intersection operation that amounts to taking a big |
@urschrei I've gotten started on a possible implementation of this, but wanted to check and see if I'm on the right track before sinking too much time into it. From profiling, it looks like a big chunk of the time, at least for the One potentially serious issue with this approach, though: currently the All of this is looking like it's going to be pretty invasive, though: lots of places are going to need to accept IDs that don't currently, and there's going to need to be a new layer of indirection introduced in several places to map from IDs to 0/1 indexes. Does this generally seem like an okay approach, or should I be pre-building/caching something at a different layer of abstraction? (Or alternatively, still pre-building a |
This seems like a good approach.
My first instinct would be to revisit the design of |
I’m glad to see someone looking at this!
I’m traveling at the moment, but will be able to better respond next week. I’m definitely interested!
… On Jul 13, 2022, at 11:23, Stephan Hügel ***@***.***> wrote:
I was thinking of creating a PreparedGeometry struct that would wrap geometries, and would prebuild one of these GeometryGraph and stash it at build time. Conveniently, the only operation that currently needs to mutate a GeometryGraph after construction at present looks like the compute_self_nodes method, and it doesn't depend on having access to the other geometry being compared, so I ought to just be able to move that operation into the GeometryGraph constructor and then have everything else be able to operate on an immutable graph (I'd probably then add a trait to get a GeometryGraph for a given geometry, which would build one on the fly for regular geometries and return a reference to the already-built one for the PreparedGeometry, and then use that in the RelateOperation constructor).
This seems like a good approach.
All of this is looking like it's going to be pretty invasive, though: lots of places are going to need to accept IDs that don't currently, and there's going to need to be a new layer of indirection introduced in several places to map from IDs to 0/1 indexes. Does this generally seem like an okay approach, or should I be pre-building/caching something at a different layer of abstraction? (Or alternatively, still pre-building a GeometryGraph, but changing its design in some more-fundamental way so it doesn't need IDs or indexes or whatnot.)
My first instinct would be to revisit the design of GeometryGraph to try to simplify these operations if that lets us avoid some of the complexity of the approach you outlined above. It may be the case that the unique ID design you sketched out is necessary, but as you say it involves quite a lot of invasive incidental complexity. I'm going to tag @michaelkirk in here as he's the author of the Relate functionality and its underlying mechanism, and may have some input here. Could you also link to a WIP branch if you have one?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
I had just a little bit of time to look at this before next week, so here are some of my thoughts so far: Some background: the existing Relate trait and GeometryGraph is strongly inspired by the approach used in JTS. So it might be useful to see how JTS or GEOS handles this (if it indeed handles this at all).
Yes indeed, the GeometryGraph is pretty much the whole enchilada, and I think it makes sense to see what work within there we can cache. After briefly reviewing the implementation in geo: You're right - the notion of geom_index is used heavily throughout the implementation, which makes it non-trivial to cache on the geometry. The idea of some shared global state to track the ID of each polygon sounds plausible, but "shared global" gives me pause. I'm not 100% allergic to it, but it'd be really great if we can find an approach which doesn't require any kind of global mutex thing to protect a counter.
In theory, I think caching the "self-noding" work could be helpful. However note that, the current data structure holds some other pieces of state on the graph that later get mutated (after
Inuitively this sounds appealing! But I haven't quite worked out how this would work. Like maybe instead of caching the entire GeometryGraph on the PreparedGeometry, we extract some subset of cacheable state called something like I can respond more in depth next week! |
Assuming the prepared geom holds a reference to the original geom, we could use the reference pointer casted as usize. Would that work as a pseudo unique index? |
Still thinking through some other design aspects and don't have anything working yet, but definitely will drop a link when I do.
Yeah will do. I did notice when profiling the application whose slowness motivated me to jump down this rabbithole that when attempting the same set of operations with geos instead to see if that was faster, it looks like
Oh, me too 😅 . I'm not sure I have a better idea yet, but will keep thinking about it.
FWIW, the types in std::atomic typically use instruction-level compare-and-swap on most architectures from what I understand, and are guaranteed to be lock-free per their docs, so there shouldn't be any concerns about contention around a mutex.
This is a good call-out... I had gotten something kind-of-sort-of-working with a subset of the changes proposed above, and tests pass, but I think it's because none of the tests try intersecting the same geometries more than once, so probably I've just lucked out, and need to think more about how to deal with this potential mutation.
If possible I'd love for it to at least be possible for the prepared geometry to own the underlying geometry... I think as a practical matter, a big motivation for having a mechanism like this will be because someone wants to hang onto a prepared geometry for awhile so they can repeatedly query it, and I think if the prepared geometry just has a reference to some other geometry that you also have to keep around (but can't move because of the outstanding reference, and can't put in the same struct as the prepared geometry due to self-referential structs being disallowed), it will pretty significantly diminish the utility. It seems like maybe we'd want regular geometries to have an Anyhow, I think that'd make it hard to do the pointer-casting ID thing; with owned data, there's at least some risk that more than one geometry gets created with the same memory address as an earlier one that had been created and subsequently moved, so I don't think we'd be able to guarantee uniqueness. |
I have been experimenting with some things here: That branch is currently quite a mess and includes a lot of dead ends that I'd want to remove, but I was curious just to get an idea for the kinds of savings we could hope to see. By caching both the RTree and the self-intersection step, and being smart about when we clone things, I'm seeing savings of about 80%.
The basic approach I took was that a My solution to the array indexing problem @apendleton highlighted was to clone the partially constructed geometry graph and just swap all the labels around as necessary depending on wether the geometry is in the A or B position. It's admittedly brain dead, but it seems to work. I'm not tied to much of the exact approach laid out in that branch — in fact I'm not really happy with a lot of the changes as they currently exist, but I think it serves as a good POC of the savings we can hope to achieve. Beyond cleanup, additional work exists to make sure it's ergonomic. Like I expect people might want to mix and match prepared and unprepared polygons. currently you can either:
but I expect people would want to mix-and-match, like:
|
I'd be keen to see this in geo crate :) |
Chipping in to say that this would be very useful! (#1184) |
JTS (and, subsequently, GEOS and several of its related downstream tools such as Shapely and PostGIS) has had the concept of a "prepared geometry" for a long time now. The idea is that upon creation, the "prepared" geometry creates a spatial index (an
rstar
r*-tree, in our case), which is a relatively expensive operation whose cost is amortised by the time savings from repeated calls to the index when e.g. checking intersections against a large number of geometries.The types of "real-world" operations that benefit from this approach are pretty common, so it's worth experimenting with something similar in
geo
. Having thought about it for 30 seconds, I think it would have to be a struct rather than a trait.This what the
geos
crate does, currently: https://docs.rs/geos/latest/geos/struct.PreparedGeometry.htmlThe text was updated successfully, but these errors were encountered: