Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ› Fix apostrophe handling #74

Closed
wants to merge 1 commit into from

Conversation

Jamim
Copy link

@Jamim Jamim commented Jan 28, 2024

Hello @marcoagpinto,

Thank you for maintaining this set of dictionaries! πŸ™‡πŸΌ

Due to the lack of the ’ character in WORDCHARS, Hunspell can't properly handle words with apostrophes.

$ cat test.txt
I've been surprised that it doesn't work as expected!
$ hunspell -l -d en_AU test.txt
ve
doesn
$ hunspell -l -d en_CA test.txt
ve
doesn
$ hunspell -l -d en_GB test.txt
$ hunspell -l -d en_US test.txt
ve
doesn
$ hunspell -l -d en_ZA test.txt
ve
doesn

As you can see, it works only for en_GB since there is a kind of apostrophe in WORDCHARS already.

Best regards!

@Jamim Jamim force-pushed the fix/apostrophe-handling branch from b8e8635 to a3fa7b4 Compare January 28, 2024 02:24
@marcoagpinto
Copy link
Owner

Heya,

I only maintain en-GB and slightly improve en-ZA since the South African guy no longer maintains his language.

I will fix it for en-ZA soon.

The other languages are maintained by Kevin Atkinson.

Please open a ticket in Kevin's GitHub:
https://github.com/en-wl/wordlist

If by May Kevin doesn't fix it, I will change his files in my GitHub personally, since May and November are the releases for the next major version of LibreOffice.

Thanks!

@Jamim Jamim marked this pull request as draft January 28, 2024 12:03
Due to the lack of the ’ character in WORDCHARS,
Hunspell can't properly handle words with apostrophes.

$ cat test.txt
I've been surprised that it doesn't work as expected!
$ hunspell -l -d en_AU test.txt
ve
doesn
$ hunspell -l -d en_CA test.txt
ve
doesn
$ hunspell -l -d en_GB test.txt
$ hunspell -l -d en_US test.txt
ve
doesn
$ hunspell -l -d en_ZA test.txt
ve
doesn
@Jamim Jamim force-pushed the fix/apostrophe-handling branch from a3fa7b4 to 32c0843 Compare January 28, 2024 23:06
@Jamim
Copy link
Author

Jamim commented Jan 28, 2024

Hello @marcoagpinto,

I've figured out en-wl/wordlist is abandoned for several years now, so chances for any changes to be merged there are not very high.
Also, I've found there is a related issue which is 8 years old:

From that issue I've learned that it would be better to add ’ rather than ' to WORDCHARS.

Since ’ is already in WORDCHARS for en_GB for a while, I believe adding it to other affix files is safe enough. And I don't think it's worth waiting any longer, so I hope this PR might be merged eventually.

Thanks!

@Jamim Jamim marked this pull request as ready for review January 28, 2024 23:46
@marcoagpinto
Copy link
Owner

Heya,

That is what I tell to everyone who complains about en-US: β€œWrite on Kevin's GitHub and good luck”.

I will add it tomorrow manually, I don't like to merge pull requests.

On 1-FEB, it will go live.

Thanks!

@marcoagpinto
Copy link
Owner

The task has been increasing since the dictionaries' maintainers are vanishing from the globe, it is even me who is fixing en-ZA.

@marcoagpinto
Copy link
Owner

It is released and fixed:

MAGP 2024-02-01

Updated the Dictionaries:
- British (Marco A.G.Pinto)
  * 181 new words
- US + CA + AU
  * Fix: apostrophe handling, by adding: WORDCHARS 0123456789’ to the .aff.
- ZA
  * Fix: Removed the: ICONV ’ ' because it was already at the end of the .aff;
    Fix: apostrophe handling, by adding: WORDCHARS 0123456789’ to the .aff;
    Improved flag J adding 424 words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants