Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ask for the Stemming feature #20

Open
redstoneleo opened this issue Jan 26, 2022 · 6 comments
Open

ask for the Stemming feature #20

redstoneleo opened this issue Jan 26, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@redstoneleo
Copy link

I'd like this feature :
https://github.com/binhetech/CyHunspell#stemming

@zverok
Copy link
Owner

zverok commented Jan 30, 2022

The possible implementation of stemming with Spylls is demonstrated in this discussion: #19 (comment)

I'll gladly accept a PR that will make it more convenient, but unfortunately don't have time to work on this myself

@zverok zverok added the enhancement New feature or request label Jan 30, 2022
@redstoneleo
Copy link
Author

As I have tested with word wrote , it is not accurate

@zverok
Copy link
Owner

zverok commented Oct 19, 2024

That just depends on the dictionary.

In the standard en-US dictionary, wrote is specified as a separate word form (like most of irregular verbs, I guess). Note that hunspell and its dictionaries are designed, first and foremost, as a spell-checking tool, not a full-fledged linguistic analysis package.

@redstoneleo
Copy link
Author

I tested with https://github.com/cdhigh/chunspell
it worked as expected

@zverok
Copy link
Owner

zverok commented Oct 20, 2024

Please show which Hunspell dictionaries you have used with both, and what was the code you have tried.

@redstoneleo
Copy link
Author

For chunspell, by default you have the only en_US dictionaries available. --https://github.com/cdhigh/chunspell?tab=readme-ov-file#dictionaries

from hunspell import Hunspell
hunSpell = Hunspell()
print(hunSpell.stem('wrote'))#gives ('write', 'wrote')

With spylls, I used the code you gave at the comment section #19 (comment)

from spylls.hunspell import Dictionary
# en_US dictionary is distributed with spylls
# See docs to load other dictionaries
dictionary = Dictionary.from_files('en_US')
from spylls.hunspell.algo.capitalization import Type as CapType

for form in dictionary.lookuper.affix_forms('wrote', captype=CapType.NO): 
  print(form.stem))#only gives 'wrote'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants