Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HANS dataset #69

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

aakanksha19
Copy link

  1. Evaluated on GPT2
  2. Time taken: 3:40:59 on GTX 1080 Ti

Other comments:

  1. Prompt template used is the same as XQUAD/PIAF, with minor addition of the question "is this true or false?" (to indicate entailment/non-entailment)
  2. In addition to accuracy, other fine-grained evaluation metrics present in the HANS evaluation script (https://github.com/tommccoy1/hans/blob/master/evaluate_heur_output.py) are also added, but can be removed if deemed unnecessary.

@aakanksha19 aakanksha19 changed the title Adding HANS dataset Add HANS dataset Oct 2, 2021
Copy link
Member

@wilsonyhlee wilsonyhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nit comments, but looks great overall. Thanks for this PR!

Also, can you fix the code styling issues (by running make quality)?

self.metrics["hans_overall_accuracy"] = accuracy / len(dataset) * 100
for heuristic in dataset.heuristics:
total = heuristic_ent_correct_count_dict[heuristic] + heuristic_ent_incorrect_count_dict[heuristic]
self.metrics["hans_{}_entailed_accuracy".format(heuristic)] = heuristic_ent_correct_count_dict[heuristic] / total * 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The logic to calculate accuracy is repeated a few times in this function. I think it'd be useful to create a helper method for it.

evaluation/tasks/hans/hans.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants