Skip to content

[ACL 2020] Towards Debiasing Sentence Representations

License

Notifications You must be signed in to change notification settings

pliang279/sent_debias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Towards Debiasing Sentence Representations

Pytorch implementation for debiasing sentence representations.

This implementation contains code for removing bias from BERT representations and evaluating bias level in BERT representations.

Correspondence to:

Paper

Towards Debiasing Sentence Representations
Paul Pu Liang, Irene Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, and Louis-Philippe Morency
ACL 2020

If you find this repository useful, please cite our paper:

@inproceedings{liang-etal-2020-towards,
    title = "Towards Debiasing Sentence Representations",
    author = "Liang, Paul Pu  and
      Li, Irene Mengze  and
      Zheng, Emily  and
      Lim, Yao Chong  and
      Salakhutdinov, Ruslan  and
      Morency, Louis-Philippe",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.488",
    doi = "10.18653/v1/2020.acl-main.488",
    pages = "5502--5515",
}

Installation

First check that the requirements are satisfied:
Python 3.6
torch 1.2.0
huggingface transformers
numpy 1.18.1
sklearn 0.20.0
matplotlib 3.1.2
gensim 3.8.0
tqdm 4.45.0
regex 2.5.77
pattern3

The next step is to clone the repository:

git clone https://github.com/pliang279/sent_debias.git

To install bert models, go to debias-BERT/, run pip install .

Data

Download the GLUE data by running this script:

python download_glue_data.py --data_dir glue_data --tasks SST,QNLI,CoLA

Unpack it to some directory $GLUE_DIR.

Precomputed models and embeddings (optional)

  1. Models

  2. Embeddings

Usage

If you choose to use precomputed models and embeddings, skip to step B. Otherwise, follow step A and B sequentially.

A. Fine-tune BERT

  1. Go to debias-BERT/experiments.
  2. Run export TASK_NAME=SST-2 (task can be one of SST-2, CoLA, and QNLI).
  3. Fine tune BERT on $TASK_NAME.
    • With debiasing
      python run_classifier.py \
      --data_dir $GLUE_DIR/$TASK_NAME/ \
      --task_name $TASK_NAME \
      --output_dir path/to/results_directory \
      --do_train \
      --do_eval \
      --do_lower_case \
      --debias \
      --normalize \
      --tune_bert 
      
    • Without debiasing
      python run_classifier.py \
      --data_dir $GLUE_DIR/$TASK_NAME/ \
      --task_name $TASK_NAME \
      --output_dir path/to/results_directory \
      --do_train \
      --do_eval \
      --do_lower_case \
      --normalize \
      --tune_bert 
      
    The fine-tuned model and dev set evaluation results will be stored under the specified output_dir.

B. Evaluate bias in BERT representations

  1. Go to debias-BERT/experiments.

  2. Run export TASK_NAME=SST-2 (task can be one of SST-2, CoLA, and QNLI).

  3. Evaluate fine-tuned BERT on bias level.

    • Evaluate debiased fine-tuned BERT.
        python eval_bias.py \
        --debias \
        --model_path path/to/model \
        --model $TASK_NAME \
        --results_dir path/to/results_directory \
        --output_name debiased
      
      If using precomputed models, set model_path to acl2020-results/$TASK_NAME/debiased.
    • Evaluate biased fine-tuned BERT.
        python eval_bias.py \
        --model_path path/to/model \
        --model $TASK_NAME \
        --results_dir path/to/results_directory \
        --output_name biased
      
      If using precomputed models, set model_path to acl2020-results/$TASK_NAME/biased.

    The evaluation results will be stored in the file results_dir/output_name.

    Note: The argument model_path should be specified as the output_dir corresponding to the fine-tuned model you want to evaluate. Specifically, model_path should be a directory containing the following files: config.json, pytorch_model.bin and vocab.txt.

  4. Evaluate pretrained BERT on bias level.

    • Evaluate debiased pretrained BERT.
      python eval_bias.py \
      --debias \
      --model pretrained \
      --results_dir path/to/results_directory \
      --output_name debiased 
      
    • Evaluate biased pretrained BERT.
      python eval_bias.py \
      --model pretrained \
      --results_dir path/to/results_directory \
      --output_name biased 
      

    Again, the bias evaluation results will be stored in the file results_dir/output_name.