The GLUE Benchmark is a group of nine classification tasks on sentences or pairs of sentences which are:

CoLA (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.is a dataset containing sentences labeled grammatically correct or not.
MNLI (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. (This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.)
MRPC (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.
QNLI (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. (This dataset is built from the SQuAD dataset.)
QQP (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.
RTE (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.
SST-2 (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.
STS-B (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.
WNLI (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. (This dataset is built from the Winograd Schema Challenge dataset.)

Provide feedback

Saved searches