Return a list of the most probable segmentations. #19

rafaveguim · 2018-03-02T14:00:46Z

It would be great if wordsegment returned that.
For instance:

> ws.rank("nobodyelse")
[ ["nobody", "else"], ["no", "body", "else"], ...]

or

> ws.probabilities("nobodyelse")
[
[ ["nobody", "else"], ["no", "body", "else"], ...],
[ 0.727362, 0.0012372, ...]
]

The text was updated successfully, but these errors were encountered:

grantjenks · 2018-03-02T23:28:07Z

Why? What's your use case? And how do you define "most probable"?

rafaveguim · 2018-03-03T05:11:14Z

Your algorithm is probabilistic, or at least it uses some sort of score. If I'm using wordsegment as part of a language modelling pipeline, I may want to propagate that measure of uncertainty.

I'm using it to model language in passwords. After a password is segmented, part-of-speech and semantic tags are inferred. The model sees these features and updates its beliefs. The devil is in ambiguity: therapistfinder has two meanings depending on segmentation. Ideally, the model should account for both.

grantjenks · 2018-03-09T01:02:58Z

Ok, pull request welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return a list of the most probable segmentations. #19

Return a list of the most probable segmentations. #19

rafaveguim commented Mar 2, 2018 •

edited

Loading

grantjenks commented Mar 2, 2018

rafaveguim commented Mar 3, 2018

grantjenks commented Mar 9, 2018

Return a list of the most probable segmentations. #19

Return a list of the most probable segmentations. #19

Comments

rafaveguim commented Mar 2, 2018 • edited Loading

grantjenks commented Mar 2, 2018

rafaveguim commented Mar 3, 2018

grantjenks commented Mar 9, 2018

rafaveguim commented Mar 2, 2018 •

edited

Loading