-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupt WOF records are not skipped during import #405
Comments
And if this would be a good first issue, let me know. I'm serious about contributing. At some point I wanna have another Pelias importer for custom user data anyways. Getting to know the stack with smaller issues would be helpful. But then a quick pointer in the right direction would be appreciated:) |
Hi @nilsnolde. What do you think the issue is here? Do you think maybe it was just that the server was having a bad day and now it works, or is there a difference between executing the commands manually on the CLI vs. using the Pelias scripts? |
Well, it has to be a network issue. I'm not saying there's a flaw in the code, obviously I executed the exact same command manually without problems. My question here is: the importer could also skip files, in case it doesn't find them right. Instead of failing entirely. |
We had a bug a while back where one layer's data was being downloaded twice, and the two downloads interacted badly together, so something like that could happen still. However, in general I think all our importers need to support a Sometimes, you want a build to fail if anything at all goes wrong. Sometimes you want to ignore failures to download/parse data. Most Pelias code was written to fail if anything goes wrong, since that's what we wanted for Mapzen Search, but I think we need to support both modes everwhere. |
Ah, damn it, there's the switch right there!! And of course it's set Thanks @orangejulius again! And for the record: I agree, having a switch is really good. At least for importers importing heaps of file like |
Oh, I didn't know it would actually work :) There's a lot of ways to download files in this repo, and I bet in at least some of the code paths, failures are still not ignored. So if you do see that, let us know. |
Hm ok, sorry, apparently setting
Any idea? |
I also hit a download error (in a file that downloads with wget in a few seconds) and so set missingFileAreFatal to false,
should be broken into several lines. I don;t know if wget is more reliable than curl |
ok by changing to wget and local files this runs to completion , I will do a pr
|
Hi,
there were quite a few corrupt WOF postal code records during download, e.g.
That was a few days ago. Weirdly,
curl https://dist.whosonfirst.org/bundles/whosonfirst-data-postalcode-ca-latest.tar.bz2 | tar -xj --strip-components=1 --exclude=README.txt
works now, if done manually. Even though the timestamp of that record didn't change since July..
There's a bunch of
bzip2
andtar
errors in the logs for the above command. Same for Japan, Portugal and GB postal codes. When downloading is finished and he's trying to inject them to ES, the following fatal error occurs:Then the WOF importer ends on error instead of skipping the Canada postal codes.
Any chance of improving that behavior?
The text was updated successfully, but these errors were encountered: