-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique user-added node IDs #65
Comments
Hm I think here's where you'd want to tread carefully, because it'll be nice to eventually be able to re-export the nodes/maps/etc back to osm or shapefiles, and we'd want to be able to (i) distinguish between user-added versus original OSM IDs, and (ii) decide what to do about user-added information when re-exporting. One way out (which seems reasonable), is to maintain our own set of IDs (which is why I'm still unsettled about #58), running from 1 to N (where N = # of nodes). That way it is easy to add nodes (just I'm not sure how |
So, you probably won't like how things work currently, then. Any time a node is added, anywhere in the package, it is done by calling the Functions I have changed addNewNode() on my local copy to take an optional argument I hear you on this idea about maintaining our own set of IDs. I don't think it's critical, since anyone who really cares can still re-load their original OSM data and check which nodes didn't exist in the original. But tracking it might be more convenient. Currently |
Mmm, I can see what you're saying, and I'm not as vexed about it as you might imagine -- feel free to continue adding functionality (and tests) if it helps with your work! I'm just wondering longer-term about the way we'll want to deal with everything (by looking at the way things are done in Python/R), and offload as much of the data-mungling to DataFrames as possible. I'll roll up my sleeves one of these days to get them done, but school and research has caught up to me, so it'll take awhile. |
Experimenting with using our own 1-N node indices to try using different graph types sounds interesting. Ideas you've mentioned elsewhere about bringing in DataFrames to to simplify data analysis sound great, but it would be an extremely heavy dependency for this use case (even if DataFrames development was much farther along). |
Oh we don't have to bring in DataFrames anytime soon, haha, just suggesting it as a way of thinking about the way we organize the indices/data in OSM internally, so that
Although they are heavy dependencies, the introduction of DataFrames really shouldn't only be for this particular usecase, and should be part of a more general pattern of working with data, based on the lessons picked up from the GIS community in Python. In particular, I quote from the UrbanSim project:
Longer term, I'm thinking of OpenStreetMap not as a standalone package, but as a package for modeling and handling OSM data, as part of a larger processing pipeline for working with geospatial data in Julia:
It doesn't require over-engineering on the part of any individual package, just that we start thinking a little harder about the interoperability of julia packages -- and I think we really aren't too far away from performing operations that were traditionally only possible with a RDBMS (i.e. PostGRES + PostGIS + PgRouting) setup. |
I like what you're saying about the ecosystem. Still not sold on using DataFrames for this even once we're using it elsewhere (even though as one of the DataFrames developers, I'd like to see them used all over ;)). I've got some more general ecosystem questions I'd like to hear your vision on. I'll open an issue in Turf. |
addNewNode!() now starts the search for a new node ID at hash(location) by default, rather than always from 1. This should speed it up and make node IDs more unique. Note that the hashes are not consistent between julia sessions. I had to modify the tests slightly, now that the node IDs on boundaries are not guaranteed to have the same IDs between sessions. See discussion at Issue #65. Signed-off-by: Ted Steiner <[email protected]>
I went ahead an committed the change I mentioned earlier. It's pretty minor in the grand scheme of things, but I think it's better than what I had before. And of course, I just noticed I forgot to actually rename the function... |
(Forgot to do the rename last commit.) Related to Issue #65. Signed-off-by: Ted Steiner <[email protected]>
So I just noticed this, and I thought it was kind of interesting. The update I made led to 1 fewer line being covered by testing, according to Coveralls. When I looked into it, this was because Also, to clarify one of my earlier comments, there is no function I also had to remove a couple pieces from the tests, where we were testing to see that specific added nodes had specific IDs (due to cropping). I don't think this is something we want to enforce or depend on, so I just removed them. It also seems to have disrupted some of the edge IDs in These node IDs aren't guaranteed to be truly unique. I thought of maybe hashing the LatLonAlt position as another option to make them truly unique. But I think it solves the problem much better than the previous version, so I'm going to close this. If we ever need truly unique IDs we can open a new issue. The big downside there is that we then have to convert all OSM ids and maintain a mapping between the two. I agree that the DataFrames do sound like an interesting option further down the line. Unfortunately I don't really have much to add to @yeesian's comments, as I don't have any experience with formal databases, but I'm open to learning more about them. |
Just a little sneak peek at what might possible: I think if we can move towards modelling geometries (see also: #64), i.e.
there are functions in Turf (known as |
There are some times when a user might want to add a new node for a specific location, and others when functions must add new nodes (most notably
cropMap!
, but maybe others, too). I originally implemented the function to handle this,addNewNode(location)
, to just count up from 1 until it finds an available node ID. This is obviously inefficient, but I've just never really cared that much. But another problem arises if you try to combine maps - every cropped map has a node "1." Is there an easier way to have unique node IDs?One thought I had was to do something like
hash((location.east,location.north))
. I'm not sure whether we'd need to use the second term or not. The only issue I can think of is that if you want to interact with these IDs by name, they are many digits and harder to keep track of mentally.The text was updated successfully, but these errors were encountered: