Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return a list of the most probable segmentations. #19

Open
rafaveguim opened this issue Mar 2, 2018 · 3 comments
Open

Return a list of the most probable segmentations. #19

rafaveguim opened this issue Mar 2, 2018 · 3 comments

Comments

@rafaveguim
Copy link

rafaveguim commented Mar 2, 2018

It would be great if wordsegment returned that.
For instance:

> ws.rank("nobodyelse")
[ ["nobody", "else"], ["no", "body", "else"], ...]

or

> ws.probabilities("nobodyelse")
[
[ ["nobody", "else"], ["no", "body", "else"], ...],
[ 0.727362, 0.0012372, ...]
]
@grantjenks
Copy link
Owner

Why? What's your use case? And how do you define "most probable"?

@rafaveguim
Copy link
Author

Your algorithm is probabilistic, or at least it uses some sort of score. If I'm using wordsegment as part of a language modelling pipeline, I may want to propagate that measure of uncertainty.

I'm using it to model language in passwords. After a password is segmented, part-of-speech and semantic tags are inferred. The model sees these features and updates its beliefs. The devil is in ambiguity: therapistfinder has two meanings depending on segmentation. Ideally, the model should account for both.

@grantjenks
Copy link
Owner

Ok, pull request welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants