You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new recognizer leveraging a ML model from a framework other than spaCy or Stanza (like flair).
Decide if model fine-tuning/training is necessary. If yes, either train/fine-tune a spaCy/stanza model, or use some other framework like transformers, flair or CRF.
Separate entity detection into different models, each detecting a one entity (see this paper for inspiration).
Evaluate different fine-tuning and transfer learning techniques.
In structured/semi-structured data settings, the actual entity value could appear elsewhere in the data. In that case, ad-hoc recognizers could be used to automatically create deny-lists of entity values. For example, if a row contains first name, last name and free text, one could create an ad-hoc recognizer with the first and last names as an ad-hoc deny-list, and these would automatically be identified as PII if they exist in the free text.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In very high level, the process for improving the PII detection rate with Presidio is the following:
Beta Was this translation helpful? Give feedback.
All reactions