You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 6, 2019. It is now read-only.
The corresponding CSV entry is: 20000,"1 boneless pork tenderloin, about 1 pound",pork tenderloin,1.0,0.0,,"boneless, about 1 pound"
The second token should be labelled "B-COMMENT" because there's no comment proceeding it.
The issue is with addPrefixes and bestTag. addPrefixes determines that '1' is both the QTY and also part of the entry's comment so it says the possible tags are ['B-COMMENT', 'B-QTY'] it then goes to the next token and determines that it's a COMMENT but tags it as I-COMMENT because the previous token has B-COMMENT as a possible tag. The bestTag picks anything over a COMMENT so it assigns the B-QTY to the '1' and 'boneless' is then tagged incorrectly with I-COMMENT.
Essentially, I think addPrefixes and bestTag should be combined into a single function since BIO chunking really needs to know what the previous tag is actually going to be.
Additionally, it may also be reasonable that if the first instance of '1' is labelled as QTY then the second should be labelled 'COMMENT', but that would be a separate issue apart from the BIO chunking.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The first entry for the test set looks like this:
The corresponding CSV entry is:
20000,"1 boneless pork tenderloin, about 1 pound",pork tenderloin,1.0,0.0,,"boneless, about 1 pound"
The second token should be labelled "B-COMMENT" because there's no comment proceeding it.
The issue is with
addPrefixes
andbestTag
.addPrefixes
determines that '1' is both the QTY and also part of the entry's comment so it says the possible tags are['B-COMMENT', 'B-QTY']
it then goes to the next token and determines that it's a COMMENT but tags it asI-COMMENT
because the previous token hasB-COMMENT
as a possible tag. ThebestTag
picks anything over a COMMENT so it assigns theB-QTY
to the '1' and 'boneless' is then tagged incorrectly withI-COMMENT
.Essentially, I think
addPrefixes
andbestTag
should be combined into a single function since BIO chunking really needs to know what the previous tag is actually going to be.Additionally, it may also be reasonable that if the first instance of '1' is labelled as QTY then the second should be labelled 'COMMENT', but that would be a separate issue apart from the BIO chunking.
The text was updated successfully, but these errors were encountered: