Skip to content

Latest commit

 

History

History
30 lines (23 loc) · 1.47 KB

Semantically Equivalent Adversarial Rules for Debugging NLP models.md

File metadata and controls

30 lines (23 loc) · 1.47 KB

Semantically Equivalent Adversarial Rules for Debugging NLP models

Title Semantically Equivalent Adversarial Rules for Debugging NLP Models
Authors Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
Year 2018
URL http://aclweb.org/anthology/P18-1079

Although many NLP models perform well on a held-out test set, they are often very brittle in the face of new examples. To expose this weakness, Ribeiro et al. introduce the concept of semantically equivalent adversaries, instances with the same meaning for which the model returns different predictions. These adversaries allow them to debug black-box NLP models in domains as varied as visual question answering and sentiment analysis.

To generate semantically equivalent adversaries, Ribeiro et al. use machine translation to generate paraphrases. The probability that a candidate paraphrases preserves the semantics of the original sentence is proportional to the number of times it is generated by back-translation through several languages. In this way, they collect semantically equivalent adversarial rules: simple substitions such as Wh pronoun + is -> Wh pronoun ’s and ? -> ?? that tend to cause prediction "flips" for between 1% to 4% of the tested instances.

Ribeiro et al. further show their rules outperform humans in debugging models. Moreover, when the training data is augmented with these rules, the retrained models become much less brittle, while keeping their accuracy.