-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore tokens matching character ranges #75
Comments
I've found Russian dictionary with cases (35MB), it will work for our case |
hi did your found dictionary solve your problem? |
ignoring character range thing is interesting though, i'll look into that, because it's already a problem for chinese and other scripts that don't really use word breaks. it should be doable in the regex with |
I've used dictionary from this repo: https://github.com/danakt/russian-words Spellr completes in around 4 secs for 650k lines of code on my 6 core macbook We are very happy with the results, now we spend less time on trivial errors during code review |
Hi!
Is it possible to add an option to ignore character ranges for tokens?
If the whole token matches one ignored character set then it will be skipped. This will still prevent mixed languages in a word but will ignore languages with different character sets.
We (unfortunately) write some comments and strings in Russian and it triggers a Spellr warning almost every time
Simple dictionary checking doesn't work well with languages that has many cases (ex: Russian, Hindi) because you have to add all cases for each word to validate properly, and I was unable to find such dictionaries.
The text was updated successfully, but these errors were encountered: