Code to reproduce results from the paper:
CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators
NeurIPS 2022 Human in the Loop Learning Workshop
This repository benchmarks algorithms that estimate:
- A consensus label for each example that aggregates the individual annotations.
- A confidence score for the correctness of each consensus label.
- A rating for each annotator which estimates the overall correctness of their labels.
This repository is only for intended for scientific purposes. To apply the CROWDLAB algorithm to your own multi-annotator data, you should instead use the implementation from the official cleanlab library.
Code to benchmark methods for active learning with multiple data annotators can be found in the active_learning_benchmarks folder.
To run the model training and benchmark, you need to install the following dependencies:
pip install ./cleanlab
pip install ./crowd-kit
pip install -r requirements.txt
Note that our cleanlab/
and crowd-kit/
folders here contain forks of the cleanlab and crowd-kit libraries. These forks differ from the main libraries as follows:
- The
cleanlab
fork contains various multi-annotator algorithms studied in the benchmark (to obtain consensus labels and compute consensus and annotator quality scores) that are not present in the main library. - The
crowd-kit
fork addresses some numeric underflow issues in the original library (needed for properly ranking examples by their quality). Instead of operating directly on probabilities, our fork does calculations on log-probabilities with the log-sum-exp trick.
To benchmark various multi-annotator algorithms using given predictions from already trained classifier models, run the following notebooks:
- benchmark.ipynb - runs the benchmarks and saves results to csv
- benchmark_results_[...].ipynb - visualize benchmark results in plots
To generate the multi-annotator datasets and train the image classifier considered in our benchmarks, run the following notebooks:
- preprocess_data.ipynb - preprocesses the dataset
- create_labels_df.ipynb - generates correct absolute label paths for images in preprocessed data
- xval_model_train.ipynb / xval_model_train_perfect_model.ipynb - trains a model and obtains predicted class probabilities for each image