Title | Semantically Equivalent Adversarial Rules for Debugging NLP Models |
Authors | Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin |
Year | 2018 |
URL | http://aclweb.org/anthology/P18-1079 |
Although many NLP models perform well on a held-out test set, they are often very brittle in the face of new examples. To expose this weakness, Ribeiro et al. introduce the concept of semantically equivalent adversaries, instances with the same meaning for which the model returns different predictions. These adversaries allow them to debug black-box NLP models in domains as varied as visual question answering and sentiment analysis.
To generate semantically equivalent adversaries, Ribeiro et al. use machine
translation to generate paraphrases. The probability that a candidate paraphrases preserves
the semantics of the original sentence is proportional to the number of times
it is generated by back-translation through several languages. In this way,
they collect semantically equivalent adversarial rules: simple substitions
such as Wh pronoun + is -> Wh pronoun ’s
and ? -> ??
that tend to cause
prediction "flips" for between 1% to 4% of the tested instances.
Ribeiro et al. further show their rules outperform humans in debugging models. Moreover, when the training data is augmented with these rules, the retrained models become much less brittle, while keeping their accuracy.