Skip to content

config fastner

Jianlin Shi edited this page Aug 23, 2017 · 1 revision

Token based rules

Each rule consists a list of elements, specifying the tokens to be matched. Token based rules support 4 wildcards:

  1. \w+: represent any token, including a word or punctuation
  2. > followed by a number X: represent any number greater than X
  3. \< followed by a number X: represent any number smaller than X
  4. \d+: reprsent any number.

Examples:

  • \w+ cough This rule will match "dry cough", "productive cough", etc.
  • temp \> 38 \<40 This rule will match "temp 38.1", "temp 39.5", etc.

Additionally \( and \) can be used to match a subset of a rule, for example:

  • dry ( cough This rule will match the word "cough" in the phrase of "dry cough"

Character based rules

The FastCNER support character based rules, which support following wildcards: ( Beginning of capturing a group ) End of capturing a group

\ plus following characters

  • p A punctuation
    • An addition symbol (to distinguish the "+" after a wildcard)
  • ( A left parentheses symbol
  • ) A right parentheses symbol
  • d A digit
  • C A capital letter
  • c A lowercase letter
  • s A whitespace
  • a A Non-whitespace character
  • u A uncommon character: not a letter, not a number, not a punctuation, not a whitespace
  • n A return