Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too basic sentence segmentation #1

Open
efi opened this issue Apr 27, 2012 · 3 comments
Open

Too basic sentence segmentation #1

efi opened this issue Apr 27, 2012 · 3 comments
Assignees

Comments

@efi
Copy link

efi commented Apr 27, 2012

The sentence counter is not really usable in real-world scenarios and should at least support the inclusion of a common abbreviation list for the current language (or multiple languages?).

See http://en.wikipedia.org/wiki/Text_segmentation#Sentence_segmentation for an example.

While people write whole dissertations about this topic, your plugin should of course not go this deep but at least provide some basic options to prevent "false positives" for sentence boundaries.

Thanks!

@ghost ghost assigned matthieua May 17, 2012
@matthieua
Copy link
Member

That's an excellent. I'm definitely considering adding a list of the most common exceptions. However, I'm not planning to support every language since I want to keep the plugin as simple as possible. The plugin would also give you the option to add your own list of exception when initialising the plugin. Would that make sense to you?

That feature is planning to be added in the version 1.1.

@efi
Copy link
Author

efi commented Jun 3, 2012

Hi. I think that would totally suffice and keep the plugin concise while allowing for very sophistcated (and maybe domain-specific) user-defined abbreviation lists.

@matthieua
Copy link
Member

Thanks for your feedback and feel free to contribute :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants