-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyphenaton error on german word "Fortschritt" #24
Comments
Same thing with medizinische it is not medizini-sche it is me-di-zi-nisch. |
A less maintenance heavy solution would be to use (myspell/hunspell) system installed hyphenations (if available). You can use a filename when calling Pyphen:
@liZe would you be interested in a PR for a fallback on system installed hyphenations? That way distro packagers could also opt to not install dictionary files, and rely on more up-to-date system hyphenations fully. |
We have a patch in Fedora which does something similar. The Fedora package does not ship any dictionaries from pyphen but that has its own drawbacks:
Therefore I'm planning to use pyphen's dictionaries in the a future update (assuming I get the privileges to update pyphen - finally). Personally if you would support system-provided dicts I'd like to see a way how callers could choose the dictionary source to prevent test failures due to outdated system dicts. |
I can't see how it would make it any better. On my system I get the same (wrong) result. I'm using Gentoo with hunspell 1.7.0 and pyphen 0.9.4. |
Encountered the same issue while making a patch for Gentoo. I was thinking the rewrite the fallback mechanism so a request for lang would default to lang_LANG. This would work for "de". For "en", I would pick the largest available territory dictionary.
Other distro's seem to have the same issues. However, now the burden to keep all dictionaries up to data falls on the Pyphen maintainers. Also, some dictionaries are not updated at upstream level.
I imagine changing the behaviour of pyphen will result in a version update, and perhaps resulting incompatibilities in other applications. For one part, this would be similar to introducing pyphen exceptions (where using applications would expect default python exceptions).
Continuing on the previous point, if you have to test from another application, I would always use the filename= argument to specify a static dictionary file which can be controlled from the testing application. |
A word of caution, as I see this is often done wrong: Language tag substitution and expansion doesn't work like that, adding or removing a For example, Swedish and Finnish are spoken in Suomi/Finland. Removing you'd run into changing the language completely, and expanding you'd face a non-trivial choice between (here: at least) two. |
(faulty utput from dic.inserted) I would like to understand how the wrong hyphenation comes about. This doesn't seem to be about the .dic file, really. The single |
results in: 'Fort-s-chritt'
The correct answer would be: 'Fort-schritt'
Although Libreoffice uses the same dictionary the result seems to be correct there.
The text was updated successfully, but these errors were encountered: