You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Short answer: the design of wordsegment is based on a trillion-word corpus and unfortunately that corpus includes no numbers. But I think it's a reasonable feature to support. Pull request welcome.
Longer answer: the wordsegment module was originally intended for input like "thisisanexample". That's where it excels. If your input has extra information in it like: "this, is an: example" that punctuation is not used by wordsegment and it may be better to pre-process the input. A simple regular expression like re.finditer(r'[a-zA-Z0-9%$]+', text) may pre-process the tokens and take advantage of the added information in the input.
If you would like to contract me to fix the issue, then I am open to that as well.
Raising an issue that I faced while using this package.
Code for Reproducing the issue:
Actual Output:
Expected Output:
Tested on Python versions:
wordsegment version:
StackOverflow Question Link:
The text was updated successfully, but these errors were encountered: