-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gazetteer model #3979
base: development
Are you sure you want to change the base?
Gazetteer model #3979
Conversation
Currently thinking a task will be best for these.
Including TODOs for my own reference
…vs st_distance_spheroid Apologies for getting somewhat opinionated here, I hope I'm not too far off... The name and the description of `select_geography_sql` (as well as comparison with `select_geometry_sql`) suggest that the return value should be of type `geography` (it was previously being cast to type geometry). But then I'm confused about `st_distance` vs `st_distance_spheroid`: * in `st_distance_spheroid` we cast geography types to geometry types, but then do distance calculations on a spheroid. * in `st_distance`, prior to the `select_geography_sql` change here, we were calculating planar "distance" between two geometry arguments and then converting it to meters - but the use of `select_geography_sql` in that function suggests that what was intended was to calculate `ST_Distance` between two geography points, which is a spheroidal calculation. So I would expect the same answer in both cases, which is what I'm getting here. As to which is faster or slower, I can't say, but `st_distance` seems more natural to me, though `st_distance_spheroid` is the one that's used in other places besides specs. If we really wanted a faster, less accurate calculation I think it would be `st_distance` between two geometry types (and then converting to meters? I'm unclear there) - should one of these be converted to that?
These are function that are either completely unreferenced or only referenced in specs that only test the deprecated function itself (i.e. not used to test some aspect of GeoItem other than the function itself). I haven't done anything with "spec helpers", i.e. functions that are used only in specs, in service of testing aspects of GeometricItem other than the helper function itself: * `ordered_by_shortest_distance_from * `ordered_by_longest_distance_from * `with_is_valid_geometry_column` * `start_point` * `st_distance_spheroid` * `st_npoints`
…eography column specs
…se with gazetteer Get geographic_item.spec tests to pass TODO: still (many?) more `type`-related changes needed for the new type.
containing_where_sql_geog is currently unused, so this is kind of moot, but: ST_CoveredBy does in fact "work" when you pass mixed geography and geometry parameters (see below for a guess of how). Oddly though, the results for both parameters geographic vs both parameters geometric differs - I would have thought ST_CoveredBy(a::geography, b::geography) iff ST_CoveredBy(a::geometry, b::geometry). select ST_CoveredBy(multi_polygon::geometry, multi_polygon::geometry) from geographic_items where id=421; => t select ST_CoveredBy(multi_polygon, multi_polygon) from geographic_items where id=421; => f select ST_CoveredBy(multi_polygon::geography, multi_polygon::geography) from geographic_items where id=421; => f Mixed parameters: select ST_CoveredBy(multi_polygon::geometry, multi_polygon) from geographic_items where id=421; => f suggesting mixed parameters are cast to geography select count(*) from (select ST_CoveredBy(multi_polygon::geometry, multi_polygon::geometry) from geographic_items) as a where a.st_coveredby = 't'; => 34713 (that's all multi polygons) select count(*) from (select ST_CoveredBy(multi_polygon, multi_polygon) from geographic_items) as a where a.st_coveredby = 't'; => 0 (took about half an hour to complete)
…y type Still not sure what the right way to expose the underlying shape of the geography column (is it point, polygon, etc). Right now there are lots of competing types and geo_types and object_types and multiple versions of those names; hopefully when it's just geography that can be simplified.
…raphy column Note that the public `geo_object_type` returns the underlying shape of the :geography column in this case (point, polygon, etc), i.e. callers should be unaware that the underlying type is geography. `geo_object` returns the actual shape column stored in GeographicItem, which now may be a geography shape - that fits with the new geo_object_type since if the shape of the :geography column is a polygon e.g. (i.e. geo_object_type returned :polygon) then the :geography column can be treated as a polygon. This also addresses in part the difficulty I was having absorbing the various geo/type/object references: geo_object, geo_object_type, geo_type, SHAPE_COLUMN, all kind of interdependent on one another. To me at least this seems simpler (though possibly slightly less performant if that's a concern).
`valid_geometry?` is another method only used in specs
…efs in GeographicItem A polygon, e.g., can now be found in either the polygon column or the geography column. GeographyItem currently synonymizes the column name with shape name, but now we need to be more general. This commit is just a test case, there are more changes needed. Once there's only a geography column then `shape_column_sql` will either be adjusted to return NULL in the ELSE case, or those queries where it's used can be turned into `WHERE ST_GeometryType(geography::geometry) = 'ST_shape'`
…on throughout Prior to 3.0 ST_Contains and siblings did not support GeometryCollection. There's still one change to be made, which I'll do when I add geography support at the same time.
…eographicItem The distinction being that SHAPE_TYPES includes point, polygon, etc. but NOT `geography` (which is a column name in DATA_TYPES). The point of doing that is that if you want to perform some geometric comparison on a given x = geographic_item, say you want to find all GeographicItems of shape polygon that intersect x, then you need to check all GeographicItems whose polygon column intersects x and all GeographicItems whose geography column contains a polygon that intersects x. The decision here was to push that reality as far back as possible, by pretending we're unaware of the distinction between polygon and geography-with-polygon-shape as long as possible.
…e geometry_collection
…_id with are_contained_in_item Is there any reason not to do this? It avoids an extra GeographicItem load, and are_contained_in_item was just a thin wrapper around are_contained_in_item_by_id. I changed all of the are_contained_in_item_by_id specs over to are_contained_in_item - some of the old are_contained_in_item specs were marked as deprecated but those all existed in the form of are_contained_in_item_by_id specs (which again is the same as are_contained_in_item), so I kept them all.
…phicItem As far as I can tell these seem to be currently unused, though I haven't marked them deprecated in the other shapes or among the private methods of GeographicItem.
…ir own module There are 350 lines worth. Some are used only in specs, mostly testing only themselves - feels like there should be a particular place for those, but I'm not finding it.
… to not use GEOMETRY_SQL Don't want to be using GEOMETRY_SQL outside of GeographicItem. I may well be missing something, but in regards to the comment about not wanting to load the entire GeographicItem, my thinking is that the size of the shape dwarfs the rest of a geoitem in general, and the rewrite here fetches that shape as wkb instead of as geojson (which i think would be larger?). Also I copied the snippet here from the same function in Georeference :)
…methods I think keeping the naming (st_covers) for the new methods would have been more confusing since there's now only one parameter; hopefully the new names `within(shape_sql)` and `covering(shape_sql)` are easily readable/meaning-guessable. The lack of `covering_union_of_sql` is due to the 'covering' case requiring that the input shape(s) are not included in the result (only an issue when there's only one shape/all shapes are the same). In the 'within' case input shapes are all included in the result. I'm not sure why the difference (since ST_Covers(A, B) iff ST_Coveredby(B, A)).
Based on the error message I think/hope I fixed the issue, though it wasn't causing an error for me running in docker before or after the upgrade to pg16 (maybe you're running a pg with different config?). Gazetteer should only require a shape and a (non-empty) name. |
Thanks, It works! |
@kleintom The blocker to us getting to Rails 7.2 has landed (rgeo/activerecord-postgis-adapter#405 (comment) ). We're still seeing issues on our test PR: https://github.com/SpeciesFileGroup/taxonworks/actions/runs/11672461015/job/32501118816, but I haven't poked at what might be the problem. Given that we're likely a little away from landing this branch, I think we should target it landing on the 7.2 PR. Can you please update your branch to reflect this (should largely be pointing to a couple new gems)? If there are spatial issues, which where the blockers in the adapter getting to 7.2, then your geo work/context might be the best place to discover these. |
…removed class Actual behavior should be unchanged.
So that the new geo code can be tested against the 7.2 activerecord-postgis-adapter https://github.com/SpeciesFileGroup/taxonworks/pull/4006/files
I'm not sure what's changed in 7.2 to cause this error: PG::IndeterminateDatatype: ERROR: could not determine data type of parameter $1 Changing `point.x` to `point.x.to_s` didn't help, changing `?` to `?::text` didn't help.
I copied @LocoDelAssembly's code to update to 7.2 and added what seems like a reasonable fix to get specs passing again (though I don't know exactly what changed in 7.2 to require it). It looks like ci won't run automatically anymore though since Gemfile.lock is going to be in conflict until the 7.2 pr lands. |
@LocoDelAssembly if you have time can you look into the Gemfile/CI issue please. @kleintom This might be the thread re the issue: rgeo/activerecord-postgis-adapter#402 (comment); and the patch: rgeo/activerecord-postgis-adapter#417 ... maybe |
The make_valid process on a polygon, e.g., can result in line_strings or points - here we choose to drop any of those lower dimensional shapes created during the make_valid process. I'd rather remove this as an option so that we have consistent behavior across all shape creation (at least for now). At least that's what I thought would happen, but on EcoRegions EcoID=753 - which is a multipolygon - ST_MakeValid still returns a geometry collection containing a multipolygon and a multilinestring...
…mping This is in addition to previously documented longitude shifting.
…for GEOS >= 3.10 I think we're probably already at sufficient postgis, which is 3.2.0 (from https://git.osgeo.org/gitea/postgis/postgis/raw/tag/3.2.0/NEWS: 'ST_MakeValid(geometry, options) allows alternative validity building algorithms with GEOS 3.10')
@kleintom We saw that there is a migration to remove |
…icItem for now Its absence is an issue when switching back and forth between dev branches, and there's no reason it needs to be removed right now.
No problem, I just needed to make |
I was going to refactor code I added to normalize longitudes of new Points - to get this GI validation to pass: taxonworks/app/models/geographic_item/point.rb Lines 27 to 31 in a804e9c
|
Reading this I had a momement of panic thinking maybe we had swtiched GeographicItem to Geometry type :( !, which would have been bad.
I don't forsee needing either Multipoints or Collections thereof. Let's let users request it before we move on that front (I can't wrap my mind on what it would mean to assign such, but maybe, like in a Lot of fish lumped into a single barrel).
I don't recall why. It might be related to downstream checks at GBIF requiring such. Mapping might also have been an issue. If we can normalize on render (e.g. to map, or DwC, or...) that's a viable solution too, as long as we persist meaning unambiguosly. |
If the type column exists, rails assumes its value is used for STI and then it wants all of the STI types that go along with, e.g., still-existing GA GIs. `inheritance_column` is the rails mechanism for changing that behavior.
…select_one Simpler cleaner more explicit calls
…sing shape Not knowing what the caller might do with the circle (or if they'll be aware of the issue), I feel like this is prudent (though in the only current case where circle is used it means make_valid_non_anti_meridian_crossing_shape will be called twice).
…crosses_anti_meridian? The xspecify of mysterious (to me) sometimes failures of crosses_anti_meridian? when one of the vertices has longitude (exactly, and only) 0 is disturbing, and I don't know exactly when it happens. If it turns out to cause problems, one option would be to implement crosses_anti_meridian? with the following pseudo-code (written here for polygons): def crosses_anti_meridian?(poly: p) prev = nil for each |vertex of exterior ring of p: v| if prev and line(prev, v) intersects anti-meridian # work around a false positive: if v.lon == 0 || prev.lon == 0 continue end return true end prev = p end return false end
I'll have more to say, probably tomorrow, on the last new commit, but I wanted to get the first new commit out today since it fixes my previous attempt to restore GI |
I finally got some notes written up (and figured out, I think) for the Ecoregions2017 shapefile - I put them here for now: https://gist.github.com/kleintom/c4c851de52380b0d08b6950e198aa8ba I don't see a way around the first two RGeo-related issues (given our factory), that said I think those issues are both going to be quite rare, involving shapes crossing +/- 85.051127 latitude (two shapes involving Antarctica in the Ecoregions case) and lines intended to take the long way around the globe. The third issue regarding what happens when imported shapes are made_valid has more wiggle room I think, maybe more reporting/warnings would be helpful there. Lastly I'll point out the following postgis oddity (x)spec'ed in my last commit above: select ST_Intersects(ST_GeographyFromText('LINESTRING (91 10, 0 0)'), ST_GeographyFromText('LINESTRING (180 89.0, 180 -89.0)'));
select ST_Intersects(ST_GeographyFromText('LINESTRING (91 0, 0 10)'), ST_GeographyFromText('LINESTRING (180 89.0, 180 -89.0)')); The first returns false, as expected (for me at least): the line I'm ready for more shapefile issues is anybody runs into some :-) |
@kleintom Thanks very much. I just got my hands on some shapefiles and hope to hit the interfaces a little today. We also will collectively explore the UI during tomorrows help sessions, so a first wave of feedback should be coming shortly. |
@kleintom can you please add |
Thanks Hernán for the reminder.
I'm submitting a first draft of
All of those are unfinished (and possibly somewhat broken), but I wanted to put something out to make sure I'm on the right track. All/any feedback welcome, but I'm not looking for a full review at this point - I consider this all to be in the early stages, so 'big' changes are fine with me.
Some notes/issues/questions I've encountered so far:
shape_column_sql(shape)
returns, e.g. the polygon column if that's present, or the geography column if that column holds a polygon. This will replace many of the existing CASE statements once everything switches over to a single geography column.select_self
.I put all of those in a separate GeographicItem::Deprecated module for now, but it's still included in GeographicItem (for now) and I don't have any real opinion on whether that's the way forward or not.
The 'New Gazetteer' task allows the user to add multiple shapes to a new gazetteer before saving (i.e. they can draw multiple shapes in leaflet, for example, for one gazetteer) - they all get bundled into a single collection behind the scenes. That differs from the georeference model I think, but fits more closely with GeographicArea shapes (I think) - this could also be a path to combining existing GeographicItems into a Gazetteer, depending on how we want to store that. Leaflet shapes and WKT shapes (and eventually other input methods) can be combined in one gazetteer - I don't keep track of which shapes came from which input types on the backend.
As @mjy suggested, I've gone with a write once read/replace pattern for the geographic item shape of a gazetteer.
I've yet to take any position on meaning of Gazetteer parent in the code.
Most of the crud create/list/show stuff should be mostly working, but it's not all finished yet.