-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processing never finishes on small dataset #38
Comments
Forgot to mention, the output goes as far as :
So it must be happening after this message, which of course you can deduct from the backtrace above, so perhaps this was not needed |
I've been debugging this a bit further, python is not my
It takes a very long time to process the first 5000 points, unusually long imho:
There are also quite some duplicates present in this dataset so it has to work hard. It doesn't make a lot of sense that is is so slow. I'll hack on this a bit more to find out where the performance hog is. When we parse the road database, it contains a lot more points :
But it seems we don't have any duplicates, so it goes really fast according to te debug logging. But the memory footprint is exactly the same as when we parse the addresses data. |
The bottleneck is on line 23 of geom.py, which is only executed for a duplicate node: |
The changes work in roelderickx/ogr2pbf, processing time is down to around 5 minutes. I'll try to backport the changes to ogr2osm and create a pull request. |
Hey, thanks a lot for looking into to this and bringing #51 to my attention. It's been a while that I hacked on this although the tool still exists and is in use. Really cool you took the time for this. ogr2pbf is one of the tools in the chain to prepare data for human assisted import into osm via josm. https://staging.grbosm.site/#/ (zoom low enough and on north part of Belgium for the layer to get pulled from postgres) Afaik, I solved it by just living with the duplicates and later on in the chain of preprocessing the data it got solved , but I don't exactly remember how. Anyway, pretty soon I'll be doing a fresh dataprocessing run which is entirely automated in fact, I will give it ago once it's backported and replace my fork , so it gets tested. The whole preprocessing of the data takes about 6 hrs on a decent google cloud node. Big thanks Roel. |
Hi,
I'm having a strange issue with ogr2osm. We are using it to prepare the Belgian open address database shapefile for loading into PSQL using OSM toolset in the following manner:
Download of the address list:
https://downloadagiv.blob.core.windows.net/crab-adressenlijst/Shapefile/CRAB_Adressenlijst.zip
It's not too big imho. And this used to work without a glitch a few months ago (it's automated), both the code of ogr2osm and the wrapper script have not changed since then (the data however did of course)
After zip extraction we first use ogr2ogr on it:
/usr/local/bin/ogr2ogr -s_srs EPSG:31370 -t_srs EPSG:4326 CrabAdr_parsed CRAB/Shapefile/CrabAdr.shp -overwrite
that step works, then we use ogr2osm like:
/usr/local/bin/ogr2osm/ogr2osm.py --idfile=ogr2osm.id --positive-id --saveid=ogr2osm.id CrabAdr_parsed/CrabAdr.shp
This keeps on going until we reach this state of the machine:
The machine still has lots of memory available:
the memory growth stops at that point, 1 cpu is 100% busy and it never stops , I left it running for over 18 hours (which is already abnormal as it used to be less than half hour). So it hangs.
I can't seem to strace this process either, I've never seen this error message before when stracing it:
and then it stays silent. I've never seen that x32 mode messages before and I've been around unix for more than 20 years.
CTRL-C works however and shows this :
which tells me this happens in mergePoints() function
I also tried using python 3.5 instead of 2.7 , same symptom.
This is part of an automated toolstack using terraform with google cloud to crunch the data, you could reproduce the entire thing building an exact same machine as we are using now with the repository below
https://github.com/gplv2/crab-osm-qa
The bash script that contains this code is here : https://github.com/gplv2/crab-osm-qa/blob/master/helpers/process_source.sh
Could you shed your 2 cents on this issue please ? I've always had success using ogr2osm tool, in fact, in the same script, we parse the open belgian road database as well and this passes fine. It's just the address database that is showing this behavior.
Would love to get some suggestions at this point. Appreciate this a lot. Thank you for your work as well, it's proven to be essential for the Belgian OSM community.
Greetings,
Glenn
The text was updated successfully, but these errors were encountered: