-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add libpostal address detection #23
Comments
This test should be done after an stdnum and a date test. in this way, we know it's not a stdnum and not a date |
the idea looks fine to me, and using libpostal excluding Except for the base regex, which in reality is not fully language independent. The current one proposed is
That may work for US English, though it might miss the state (e.g. And other languages can be more diverging: These are a few examples of sequences I came up with:
Other countries might fit in these patterns, for instance Portugal is quite similar to Spain (but Brazil puts postcode after the city) The wikipedia page about addresses has a good recollection: https://en.wikipedia.org/wiki/Address |
@shamikbose see above ^^. |
Thanks, @ontocord ! I will look into it this weekend |
Add regex for basic potential addresses such as a \d+ followed by \s+ and a \w {5,30} and a comma and then another \d+. Then test if there's no stopwords within the \w, and then feed the whole thing to libpostal to check if there is an address. Libpostal will tell us house, road, etc. We need to check if there is a road, etc. "house" doesn't really tell us anything as that is almost always caught.
The text was updated successfully, but these errors were encountered: