This repo contains code used in the decoding experiments from various NLG fairness metrics this paper, which can be cited as follows:
@inproceedings{sheng2021societal,
title={Societal Biases in Language Generation: Progress and Challenges},
author={Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing},
year={2021}
}
-
The regard metric is from: The Woman Worked as a Babysitter: On Biases in Language Generation. The code + classifier can be found here.
-
The African American English/White-Aligned English evaluations are from Investigating African-American Vernacular English in Transformer-Based Text Generation, and the dataset can be found here.
-
The individual/group fairness distributional metrics are from Reducing Sentiment Bias in Language Models via Counterfactual Evaluation.
-
The gendered word co-occurrence score metric is from Identifying and Reducing Gender Bias in Word-Level Language Models.
-
data/female_word_list.txt
anddata/male_word_list.txt
are taken from here.
To run scripts, first run:
conda create --name decoding-biases python==3.7
conda activate decoding-biases
pip install -r requirements.txt
To generate samples, you can run:
python generate.py \
--evaluation regard \
--model_type gpt2 \
--decode_type greedy
Run python generate.py -h
to see all options.
To run the aae-wae
generation/evaluation, you'll have to contact the authors of the dataset here to obtain the prompts and then put the aae_samples.tsv
and wae_samples.tsv
samples in data/
.
The current script will generate 100 samples per prompt if the evaluation is regard
and 1 sample per prompt for all other evaluations, consistent with what is described in the original paper.
To run the regard evaluations on the generated samples, you'll have to first download the regard classifier here.
Since the classifier was trained with demographics masked out with "XYZ", we suggest doing the same with the generated samples.
In other words, you can take the file of samples generated with generate.py
(e.g., gpt2.greedy.regard.csv
), replace demographics with XYZ
, input the resulting file to the regard classifier, and use the file output by the classifier as the regard_file
below.
To then run the regard evaluation:
python evaluate.py \
--evaluation regard \
--model_type gpt2 \
--decode_type greedy \
--regard_file [prediction file from regard classifier] \
--unmasked_regard_file gpt2.greedy.regard.csv
To run the aae-wae evaluation:
python evaluate.py \
--evaluation aae-wae \
--model_type gpt2 \
--decode_type greedy \
--aae_wae_sentiment_file gpt2.greedy.aae-wae.csv
To run the IF/GF evaluation:
python evaluate.py \
--evaluation distrib \
--model_type gpt2 \
--decode_type greedy \
--distrib_file gpt2.greedy.distrib.csv
To run the gendered word co-occurrence score evaluation as described in the original paper, you'll have to have generated samples for the other evaluation: regard
, distrib
, and aae-wae
. Then, run the following:
python evaluate.py \
--evaluation ratio \
--model_type gpt2 \
--decode_type greedy \
--regard_file [prediction file from regard classifier] \
--unmasked_regard_file gpt2.greedy.regard.csv \
--distrib_file gpt2.greedy.distrib.csv \
--aae_wae_sentiment_file gpt2.greedy.aae-wae.csv