Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBEX wiki Checker Python Library Upgraded #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RaiBishal
Copy link
Contributor

@RaiBishal RaiBishal commented Sep 17, 2019

For ticket: ISISComputingGroup/IBEX#4620

  • Edited batch file to install pyspellchecker
  • Edited page_tests.py to use new pyspellchecker library
  • Added new words to be ignored in words.txt

- Edited page_tests.py to use new pyspellchecker library
- Added new words to be ignored in words.txt
Copy link
Contributor

@DominicOram DominicOram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the library should ideally have meant that we didn't need to change words.txt. Where are all the new spelling errors coming from? If they're genuine problems that the old library didn't spot we should fix them. If they are not genuine we should make sure this spell checker ignores them in the same way as the old one. If the new library doesn't provide the flexibility of the old one and it would be painful to add then we can put this as won't fix but we should make a solid decision to do this.

@@ -5,9 +5,13 @@
import requests
import concurrent.futures

from spellchecker import SpellChecker as PySpellChecker
from enchant.checker import SpellChecker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove references to enchant and uninstall it from python if we're no longer using it.


filters = [URLFilter, EmailFilter, MentionFilter, WikiWordFilter]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of all these filters means we're now gonna have loads more errors for emails and urls, which we needn't check.

_move
_move_
_moving1
_my_instrument_name_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the previous library split this into my, instrument and name then checked all of them separately, which is much better as it would lead to less errors

@RaiBishal
Copy link
Contributor Author

The new spellchecking library is very limited compared to the previous one. The only other way I can think of fixing this issue is by adding words during runtime however it is not sustainable. I shall look into JamSpell library. If there are any other better libraries you can think of please post it here.

@DominicOram
Copy link
Contributor

I can't think of any. It may be if we can't find anything sensible we just don't worry about it being depreciated

@RaiBishal
Copy link
Contributor Author

I cannot find any good libraries which are maintained currently, apart from the ones which use C/C++ library and needs to compile.

@DominicOram
Copy link
Contributor

Ok, can you write a note on the original ticket saying which libraries you looked at and what was wrong with them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants