-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue#58: Adding Grammar Analyzer feature to GatorMiner #90
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Mai1902, thanks for working on this feature! It seems like there are still a couple of errors in the code that need to be fixed so that the program can be executed. See the details below.
Additionally, I noticed that there is a language model being installed during the execution. It is quite large. Perhaps, it would be best to clarify somewhere that the grammar feature requires the installation of such models. I also just want to report that the plots seem to be taking forever to load, is there any bug here or is that just part of the feature?
err_num = err_num + len(matches) | ||
|
||
# Store all alphanumeric characters in the reflection in a list | ||
words = re.sub('[^0-9a-zA-Z]+', ' ', str(text)).lower().split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not great at regex, although, is this line tokenizing the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is expected to replace all non alphanumeric character in the text with white space and then tokenize that text, store under the list call words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks! That's what I thought! If that's the case, the tokens are already processed and stored in the data frame when markdown documents are being imported. See if you can just reuse them instead of retokenizing from the text.
Also, don't forget to resolve all the merge conflicts. The |
Codecov Report
@@ Coverage Diff @@
## master #90 +/- ##
==========================================
+ Coverage 91.66% 92.09% +0.42%
==========================================
Files 6 7 +1
Lines 240 253 +13
==========================================
+ Hits 220 233 +13
Misses 20 20
|
Please update your branch/PR with master |
Please make sure to resolve your conflicts in |
What is the current behavior?
We believe that Grammar is one of the important criteria to judge the quality of a reflection; hence, our team want to add a grammar error analyzer as a new feature to Gator Miner. This is the implementation of issue #58.
Purpose of this feature:
The Grammar Analyzer is a tool that will scan an assignment through efficient coding and output two things. It will show where and what words have grammar errors while also revealing a score/grade in relation to the amount of grammatical errors. This tool will be a great addition for GatorMiner because it will add another dimension to revealing an assignment’s meticulousness, quality, and integrity
What is the new behavior if this PR is merged?
As of right now, our source code of grammar analyzer has been fully implemented and tested, which is able to return the correct number of errors and the percentage of error grammar in a text per number of words. However, such feature only work with a short text instead of a long text like reflection.
We also adding a new page into streamline_web.py under the title of Grammar Checker, but it hasn't return anything since we are still struggling with implementing appropriate data frame.
Type of change
Please describe the pull request as one of the following:
Other information: Full documentation on our pending work is as follows:
Current outcome of the latest push:
The current outcome of the implementation is that the code is functional but inefficient. What we were hoping to accomplish was a properly working analyzer that scans through all the input values and posts on a table the student ID, the number of errors and the percentage of errors. The code works, however we are having some issues with the efficiency of the library that we are using because it takes too long to run. Though we got what what we wanted in terms of a functioning code, it doesn’t run as proficient as we were hoping.
Implementation of source code explanation:
The source code adopted the grammar checking tool from the library language-tool-python. The only method in this file is taking input text as the parameter and then checking for the number of grammar errors in each line of the text. This method returns the number of grammar errors in a text and error percentage per number of words in the text.
Current issue:
The code in this PR has been tested and proven to be correct in terms of grammar error. However, this source code only functions normally with a small input text (1-2 paragraph as maximum). When my team tried to parse a large input text inside (a reflection), this code took an extremely long run time and failed to produce any output at the end, even though there’s no bugs in the code.
After some trial, we realized that the language-tool-python library is a fork of language_check library, which is already outdated. Hence, the tool is not efficient, resulting in infinite run time of the program.
Possible solution for future implementation:
This PR has:
Developers
@Mai1902 @Kevin487 @TheShiny1 @Batmunkh0419