This repository contains the code for the paper Fairness Under Demographic Scarce Regime (FairDSR). Demographic Scarce Regime refers to settings where demographic information (sensitive attribute) is not fully available. The paper studies the properties of the sensitive attribute classifier that can affect the fairness-accuracy tradeoffs of the downstream classifier. The paper demonstrates that applying fairness constraints on samples with a lower uncertainty in the sensitive attributes provides better results in terms of fairness-accuracy tradeoffs.
The project requires the following Python packages:
- numpy
- pandas
- h5py
- mpi4py
- scikit-learn
- tensorflow
- folktables
- fairlearn
- tensorboard
- torchvision
other dependencies a located in requiments.txt file
Download each dataset and store them (preprocessed) in the folder preprocessing. For each dataset create two different csv files for each subsets:
Run the following code to train the fair model using samples with low uncertainty; with uncertainty measured using conformal predictions.
python3 src/Proxy_Evaluation.py --sensitive_feature_type=cp --cp_alpha 0.05 --seed=1 --dataset adult --base_model lr --fair_metric dp
The file src/demographic_predictor.py
contains code to train the attribute classifier with uncertainty estimation.
The file src/Proxy_Evaluation.py
contains code to train and evaluate fair models with different attribute classifier baselines (proxies).
Assuming nbr_core
is the number of core you want to use:
cd src
mpiexec -n nbr_core python Proxy_Evaluation.py
You will have to provide the number of cores in your submission file and srun will use all the core available:
cd src
srun python Proxy_Evaluation.py
dataset
(string): Specify the dataset to be used:adult
,compas_race
,new_adult
,lsac_sex
,celeba_attract
.seed
(int): Random seed.fair_metric
(dp
,eodds
,eop
): Fairness metricdp
: Demographic Parityeop
: Equal Opportunityeodds
: Equalized Odds
demographic_predictor
(DNN, KNN): Model used to infer the sensitive attribute from the related feature. This parameter is used whensensitive_feature_type
ispredicted
. Possible valuesDNN
andKNN
.DNN
use MLP based attribute classifierKNN
use KNN based attribute classifier (imputation)
is_adv_method
(boolean): Whether to use adversarial debaising method.base_model
(string): Base classifier for reduction methods.lr
: LogisticRegressionrf
: RandomForestgbm
: GradientBoostingClassifier
sensitive_feature_type
(string): Use clean or predicted sensitive features.clean
: apply fairness mechanism w.r.t ground truth sensitive attributeours
: apply fairness mechanism w.r.t mostly certain predicted sensitive attributescp
: apply fairness mechanism w.r.t mostly certain predicted sensitive attributes using conformal predictions.predicted
: apply fairness mechanism w.r.t MLP or KNN based attribute classifier.
cp_alpha
: Coverage of the prediction set used in conformal prediction.
Use the file src/target_predictor.py to train the target classifier with fairness mechanisms not supported by fairlearn.
python src/target_predictor.py --dataset $DATASET --baseline $BASELINE
baseline
(ARL, DRO, CVAR, VANILLA, FAIRDA): specify the baseline to use to train the target classifier.ARL
: train the classifier with Adversarially Reweighted Learning (ARL) by Lahoti et al. (2020.).DRO
: train the classifier with robust loss; distributionally robust optimization (DRO) by Hashimoto et al. (2018).CVAR
: train the classifier with robust loss and KL-regularized (fast DRO) by Levy et al. (2020).FAIRDA
: train the classifier with FAIRDA by Liang, Yueqing, et al.VANILLA
: train the classifier without fairness constraints.
dataset
(string): specify the dataset to be used:adult
,compas_race
,new_adult
,lsac_sex
,celeba_attract
.
Use file src/Unfair_Evaluation.py to run the baselines without fairness constraints. It uses the parameters --base_model
, --dataset
, and --seed
as described above.
Assuming that you have done all the experiments for every baseline, the results for each baseline and each seed are stored in the folder output
.
To aggregate the results across seeds, run the file analysis/compute_average.py with the argument --dataset
specifying the dataset.
The notebook analysis/plots.ipynb has functions to plot the results in the paper. The plots are saved in the folder analysis/results/plots
.
For this experiment, use the file src/Ablation_Proxy_Evaluation.py to train fair classifier for different uncertainty thresholds by setting the parameter --treshold_uncert
to define the confidence threshold. The results for each baseline and for each seed are stored in the folder output/{dataset}/ablation
and for each dataset specified with the parameter --dataset
as mentioned above.
If you use (parts of) this code, please cite:
@article{
kenfack2024b,
title={Fairness Under Demographic Scarce Regime},
author={Patrik Joslin Kenfack and Samira Ebrahimi Kahou and Ulrich A{\"\i}vodji},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2024},
url={https://openreview.net/forum?id=TB18G0w6Ld},
note={}
}