ActiveEval: Active evaluation of classifiers

ActiveEval implements a framework for active evaluation in Python. It solves the problem of estimating the performance of a classifier on an unlabeled pool (test set), using labels queried from an oracle (e.g. an expert or crowdsourcing platform). Several methods are implemented including passive sampling, stratified sampling, static importance sampling and adaptive importance sampling. The importance sampling methods aim to minimize the variance of the estimated performance measure, and can yield more precise estimates for a given label budget. Several evaluation measures are currently supported including accuracy, F-measure, and precision-recall curves. The package is designed to be extensible.

Installation

Requires Python 3.7 or higher.

Dependencies:

numpy
scipy
treelib

Install using pip with:

$ pip install activeeval

Example

from activeeval.measures import FMeasure, BalancedAccuracy
from activeeval.proposals import StaticVarMin
from activeeval.pools import Pool
from activeeval.estimators import AISEstimator
from activeeval import Evaluator

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Generate an artificial imbalanced classification dataset
X, y = make_classification(n_samples=10000, weights=[0.99], random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Train a classifier to evaluate
clf = LogisticRegression(class_weight="balanced").fit(X_train, y_train)

# Specify pool and target evaluation measure
pool_size = X_test.shape[0]
pool = Pool(features=X_test)
y_pred = clf.predict(X_test)
fmeasure = FMeasure(y_pred)

# Specify a static variance-minimizing proposal using the classifier to
# estimate the oracle response
response_est = clf.predict_proba(X_test)
proposal = StaticVarMin(pool, fmeasure, response_est, deterministic=True)

# Estimate the evaluation measure after collecting 1000 labels
evaluator = Evaluator(pool, fmeasure, proposal)
n_queries = 1000
for _ in range(n_queries):
    # Query an instance to label
    instance_id, weight = evaluator.query()

    # Get label from oracle
    label = y_test[instance_id]

    # Update
    evaluator.update(instance_id, label, weight)

print("Estimate of F1 score after 1000 oracle queries is", evaluator.estimate)

# Reuse the samples from above to estimate a different measure
bal_acc = BalancedAccuracy(y_pred)
bal_acc_est = AISEstimator(bal_acc)
for sample in evaluator.sample_history:
    bal_acc_est.update(sample.instance_id, sample.label, sample.weight)
print("Estimate of Balanced accuracy using previous oracle queries is", bal_acc_est.get())

Support

Please open an issue in this repository.

License

ActiveEval is released under the MIT license.

Citation

[1]

Neil G. Marchant and Benjamin I. P. Rubinstein. 2021. Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21), ACM, New York, NY, USA, 11 pages. DOI: 10.1145/3447548.3467435. arXiv: 2006.06963.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

ActiveEval: Active evaluation of classifiers

Installation

Example

Support

License

Citation

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

ActiveEval: Active evaluation of classifiers

Installation

Example

Support

License

Citation