Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBEX wiki Checker Python Library Upgraded #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions run_tests.bat
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
set PYTHONIOENCODING=utf-8
%~dp0Python\python.exe -m pip install requests
%~dp0Python\python.exe -m pip install mock
%~dp0Python\python.exe -m pip install pyspellchecker
%~dp0Python\python.exe run_tests.py --remote
19 changes: 13 additions & 6 deletions tests/page_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,13 @@
import requests
import concurrent.futures

from spellchecker import SpellChecker as PySpellChecker
from enchant.checker import SpellChecker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove references to enchant and uninstall it from python if we're no longer using it.

from enchant.tokenize import URLFilter, EmailFilter, WikiWordFilter, MentionFilter

# Creating this object here to avoid creating duplicates which leads to memory corruption
# not sure why its happening, Garbage collector not quick enough?
checker_py = PySpellChecker()

def strip_between_tags(expression, text):
if text is None:
Expand All @@ -25,7 +29,6 @@ def strip_between_tags(expression, text):
new_text += text[matches[-1].end(): len(text)]
return new_text


class PageTests(unittest.TestCase):
def __init__(self, methodName, ignored_items, wiki_info=None):
"""
Expand Down Expand Up @@ -81,6 +84,7 @@ def strip_inline_code_blocks(text):
def remove_bold_and_italics(text):
return text.replace("*", "")


with open(self.page, "r", encoding="utf-8") as wiki_file:
text = remove_bold_and_italics(
replace_selected_specials_with_whitespace(
Expand All @@ -97,17 +101,20 @@ def remove_bold_and_italics(text):
)
)
)
## translate string to lsit of words
list_of_words = re.findall(r"\w+", text)
# check if list of words exist in the dictionary
errors = checker_py.unknown(list_of_words)

filters = [URLFilter, EmailFilter, MentionFilter, WikiWordFilter]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of all these filters means we're now gonna have loads more errors for emails and urls, which we needn't check.

checker = SpellChecker("en_UK", filters=filters, text=text)

failed_words = filter_upper_case(
{err.word for err in checker if err.word.lower() not in self.ignored_words})
failed_words = filter_upper_case({
err for err in errors if err.lower() not in self.ignored_words
})

if len(failed_words) > 0:
self.fail("The following words were spelled incorrectly in file {}: \n {}".format(
self.page, "\n ".join(failed_words)
))


def test_GIVEN_a_page_IF_it_contains_urls_WHEN_url_loaded_THEN_response_is_http_ok(self):

Expand Down
Loading