Semantically Equivalent Adversarial Rules for Debugging NLP models


Title	Semantically Equivalent Adversarial Rules for Debugging NLP Models
Authors	Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
Year	2018
URL	http://aclweb.org/anthology/P18-1079

Although many NLP models perform well on a held-out test set, they are often very brittle in the face of new examples. To expose this weakness, Ribeiro et al. introduce the concept of semantically equivalent adversaries, instances with the same meaning for which the model returns different predictions. These adversaries allow them to debug black-box NLP models in domains as varied as visual question answering and sentiment analysis.

To generate semantically equivalent adversaries, Ribeiro et al. use machine translation to generate paraphrases. The probability that a candidate paraphrases preserves the semantics of the original sentence is proportional to the number of times it is generated by back-translation through several languages. In this way, they collect semantically equivalent adversarial rules: simple substitions such as Wh pronoun + is -> Wh pronoun ’s and ? -> ?? that tend to cause prediction "flips" for between 1% to 4% of the tested instances.

Ribeiro et al. further show their rules outperform humans in debugging models. Moreover, when the training data is augmented with these rules, the retrained models become much less brittle, while keeping their accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantically Equivalent Adversarial Rules for Debugging NLP models.md

Semantically Equivalent Adversarial Rules for Debugging NLP models.md

Semantically Equivalent Adversarial Rules for Debugging NLP models

Files

Semantically Equivalent Adversarial Rules for Debugging NLP models.md

Latest commit

History

Semantically Equivalent Adversarial Rules for Debugging NLP models.md

File metadata and controls

Semantically Equivalent Adversarial Rules for Debugging NLP models