Release Release 0.5.1 · bminixhofer/nlprule

Breaking changes

Changes the focus from Vec<Token> to Sentence (#54). pipe and sentencize return iterators over Sentence / IncompleteSentence now.
Removes the special SENT_START token (now only used internally). Each token corresponds to at least one character in the input text now.
Makes the fields of Token and IncompleteToken private and adds getter methods (#54).
char_span and byte_span are replaced by a Span struct which keeps track of char and byte indices at the same time (#54). To e.g. get the byte range, use token.span().byte().
Spans are relative to the input text now, not anymore to sentence boundaries (#53, thanks @drahnr!).

New features

The regex backend can now be chosen from Oniguruma or fancy-regex with the features regex-onig and regex-fancy. regex-onig is the default.
nlprule now compiles to WebAssembly. WebAssembly support is guaranteed for future versions and tested in CI.
A new selector API to select individual rules (details documented in nlprule::rule::id). For example:

use nlprule::{Tokenizer, Rules, rule::id::Category};
use std::convert::TryInto;

let mut rules = Rules::new("path/to/en_rules.bin")?;

// disable rules named "confusion_due_do" in category "confused_words"
rules
   .select_mut(
       &Category::new("confused_words")
           .join("confusion_due_do")
           .into(),
   )
   .for_each(|rule| rule.disable());

// disable all grammar rules
rules
   .select_mut(&Category::new("grammar").into())
   .for_each(|rule| rule.disable());

// a string syntax where slashes are the separator is also supported
rules
   .select_mut(&"confused_words/confusion_due_do".try_into()?)
   .for_each(|rule| rule.enable());

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.5.1

Breaking changes

New features