Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Abstract

Learning to defer (L2D) aims to improve human-AI collaboration systems by deferring decisions to humans when they are more likely to make the correct judgment than a ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, such as: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have separate costs; ii) requiring concurrent human predictions for every instance of the dataset in training and iii) not dealing with human work capacity constraints. To address these issues, we propose the Deferral under Cost and Capacity Constraints Framework (DeCCaF) - a novel L2D approach, employing supervised learning to model the probability of human error with less restrictive data requirements (only one expert prediction per instance), and using constraint programming to globally minimize error cost subject to workload constraints. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work capacity constraints. We demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average reduction in the misclassification cost of 8.4%.

Resources

In order to ensure complete reproducibility, we provide users with:

Code used to run experiments.
Datasets, models and results used/produced in our experiments.
- Synthetically Generated Data - Expert predictions, training scenarios and capacity constraints
- ML models - Alert Model, OvA Classifiers and Human Expertise Model
- Results - Set of assignments and decisions resulting from the deferral experiments

Note: This data is included due to the fact that LightGBM models are known to produce different results depending on operating system, python versions, number of cores in training, among other factors

The submitted version of the paper and the appendix are available here.

Creating the Python Environment

Requirements:

miniforge3

Before using any of the provided code, to ensure reproducibility, please create and activate the Python environment by running

conda env create -f environment.yml
conda activate deccaf-env

Replicating Results

To replicate the generation of the synthetic data, as well as our experiments, please execute the following steps:

Attention: Run each python script inside the folder where it is located, to ensure the relative paths within each script work correctly.

Step 1 - Clone the Repo and Download Dataset

After cloning the repo, please extract the Datasets, models and results file inside the repo's folder, ensuring that your directory looks like this

Note that during the following steps, training of models and generation of results will be skipped if the output files already exist within the Data_and_models folder. This was done to ensure complete reproducibility and analysis of the source files used in the paper. If you wish, however, to run experiments from scratch, you will have to delete every folder and file within the Data directory, except for:

Data_and_models/data/Base.csv: the raw version of the BAF dataset
Data_and_models/experts/: the synthetic expert predictions used in the paper.
Data_and_models/testbed/: the expert capacity constraints and batches used in the paper.

The code for the expert and testbed generation is not made available, as the expert simulation framework was submitted as a contribution to a different venue, focusing on synthetic data generation.

deccaf
│   README.md
│   .gitignore  
│   environment.yml
│
└─── Code
│   │   ...
│   
└─── Data_and_models
    │   ...

Step 2 - Activate the Environment

To activate the Python environment with the necessary dependencies please follow these steps

Step 3 - Train the Alert Model and create the set of alerts

To train the Alert Model, run the file Code/alert_model/training_and_predicting.py, which will train the Alert Model and score all instances in months 4-8 of the BAF dataset.

Then, run the file Code/data/preprocess.py, to create the dataset of 30K alerts raised in months 4-8. This will be the set of instances used over all following processes.

Step 4 - Train classifier h

As both DeCCaF and OvA share the classifier h, we train it first, by running the script Code/classifier_h/training.py. The classifier is trained first due to the fact that its performance is used as a reference to generate experts with a similar misclassification cost.

Step 5 - Train DeCCaF and OvA systems

To train the DeCCaF system run the script Code/expert_models/run_deccaf.py. To train the OvA system run the script Code/expert_models/run_ova.py.

Step 6 - Run the Deferral Experiments

To reproduce the deferral testing run the script Code/deferral/run_alert.py. These results can then be evaluated with the notebook Code/deferral/process_results.ipynb

Notebooks

We include notebooks to facilitate analysis of:

Synthetic Experts' Decisions
ML Model, Human Expertise Model and OvA Classifiers
Deferral Results

We also facilitate further analysis of our generated experts and the conducted benchmarks, by providing users with two Jupyter Notebooks

Code/deferral/process_results.ipynb - which contains
- evaluation of the deferral performance of all considered L2D baselines
- evaluation of the performance and calibration of Classifier h, OvA Classifiers, and DeCCaF's team correctness prediction models.
Code/synthetic_experts/expert_analysis.ipynb - which contains the evaluation of the expert decision-making process properties.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Code		Code
Documents		Documents
Images		Images
.gitattributes		.gitattributes
.gitignore		.gitignore
Data_and_models.zip		Data_and_models.zip
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Abstract

Overview

Resources

Creating the Python Environment

Replicating Results

Step 1 - Clone the Repo and Download Dataset

Step 2 - Activate the Environment

Step 3 - Train the Alert Model and create the set of alerts

Step 4 - Train classifier h

Step 5 - Train DeCCaF and OvA systems

Step 6 - Run the Deferral Experiments

Notebooks

About

Releases

Packages

Languages

License

feedzai/deccaf

Folders and files

Latest commit

History

Repository files navigation

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Abstract

Overview

Resources

Creating the Python Environment

Replicating Results

Step 1 - Clone the Repo and Download Dataset

Step 2 - Activate the Environment

Step 3 - Train the Alert Model and create the set of alerts

Step 4 - Train classifier h

Step 5 - Train DeCCaF and OvA systems

Step 6 - Run the Deferral Experiments

Notebooks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages