You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A potential enhancement to the rule based taggers, both the spaCy version and non-spaCy version, could be a verbose setting whereby each token when it gets tagged will have another tag with the rule that produced that tag, e.g. In the rules for these taggers, shown below, we can add a label to each rule for example the first rule could be labelled R1 the second R2, etc when tagging in verbose mode each token can have one of these rules tags alongside the USAS tags, an example is shown below the rules. What do you think @perayson it could make the tagger more explainable and easier to debug for users.
Tagger Rules
If pos_mapper is not None, map the POS, from the POS model,
to the first POS value in the List from the pos_mappers Dict. If the pos_mapper cannot map the POS, from the POS model, go to step 9.
If POS==punc label as PUNCT
Lookup token and POS tag
Lookup lemma and POS tag
Lookup lower case token and POS tag
Lookup lower case lemma and POS tag
if POS==num label as N1
If there is another POS value in the pos_mapper go back to step 2
with this new POS value else carry on to step 9.
Lookup token with any POS tag and choose first entry in lexicon.
Lookup lemma with any POS tag and choose first entry in lexicon.
Lookup lower case token with any POS tag and choose first entry in lexicon.
Lookup lower case lemma with any POS tag and choose first entry in lexicon.
Label as Z99, this is the unmatched semantic tag.
Example
This is an example of what it could output if we went ahead with this idea:
frompymusas.lexicon_collectionimportLexiconCollectionfrompymusas.taggers.rule_basedimportUSASRuleBasedTaggerwelsh_lexicon_url='https://raw.githubusercontent.com/apmoore1/Multilingual-USAS/master/Welsh/semantic_lexicon_cy.tsv'lexicon_lookup=LexiconCollection.from_tsv(welsh_lexicon_url, include_pos=True)
lemma_lexicon_lookup=LexiconCollection.from_tsv(welsh_lexicon_url, include_pos=False)
tagger=USASRuleBasedTagger(lexicon_lookup, lemma_lexicon_lookup)
output=tagger.tag_token(('[','[','punc'), verbose=True)
usas_tags, rule=outputassertusas_tags== ['PUNCT']
# Second rule from the above rules, as it is# a punctuation symbolassertrule=='R2'
The text was updated successfully, but these errors were encountered:
A potential enhancement to the rule based taggers, both the spaCy version and non-spaCy version, could be a verbose setting whereby each token when it gets tagged will have another tag with the rule that produced that tag, e.g. In the rules for these taggers, shown below, we can add a label to each rule for example the first rule could be labelled
R1
the secondR2
, etc when tagging in verbose mode each token can have one of these rules tags alongside the USAS tags, an example is shown below the rules. What do you think @perayson it could make the tagger more explainable and easier to debug for users.Tagger Rules
pos_mapper
is notNone
, map the POS, from the POS model,to the first POS value in the
List
from thepos_mapper
sDict
. If thepos_mapper
cannot map the POS, from the POS model, go to step 9.POS==punc
label asPUNCT
POS==num
label asN1
pos_mapper
go back to step 2with this new POS value else carry on to step 9.
Z99
, this is the unmatched semantic tag.Example
This is an example of what it could output if we went ahead with this idea:
The text was updated successfully, but these errors were encountered: