Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct matches not treated as candidates as names seem too different #11

Open
hbruch opened this issue Jan 7, 2023 · 1 comment
Open

Comments

@hbruch
Copy link
Member

hbruch commented Jan 7, 2023

Possible match candidates are ignored, if their names are very different, e.g. their bigram comparison is less then (currently) 0.3.
In cases like e.g. Urbach (de:08119:5789:1:1) Urbach (b Schorndorf) (n61278864) location specifiers in parentheses reduce the bigram equivalence value below the threshold and the match candidates are ignored:

image

2023-01-07 10:41:16 INFO     Ignore Urbach (de:08119:5789:1:1)  Urbach (b Schorndorf) (n61278864) with distance 49.98712924591864 and name similarity 0.2857142857142857. Platform matches? 0.85 as name distance low

We should possibly normalize osm and stop names, e.g. by scrubbing parentheses and insignificant name parts.

@derhuerst
Copy link
Contributor

db-clean-station-name has been built specifically to remove these (b *) suffixes, among other cleanups. But currently, it fails with Urbach (b Schorndorf).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants