From 6231b1c04aecefe41c9fe1c4ced29550ba7fff2d Mon Sep 17 00:00:00 2001
From: Eddie Bergman <eddiebergmanhs@gmail.com>
Date: Tue, 27 Jul 2021 17:13:45 +0200
Subject: [PATCH 1/4] Leaderboard (#1185)

* Implemented `def leaderboard`

Still requires testing, only works for classification

* Fixed some bugs

* Updated function with new params

* Cleaned info gathering a little

* Identifies if classifier or regressor models

* Implemented sort_by param

* Added ranking column

* Implemented ensemble_only param for leadboard

* Implemented param top_k

* flake8'd

* Created fixtures for use with test_leaderboard

* Moved fixtures to conftest, added session scope tmp_dir

For the autoML models to be useable for the entire session without
training, they require a session scoped tmp_dir. I tried to figure out
how to make the tmp_dir more dynamic but documentation seems to imply
that the scope is set at *function definition*, not on function call.
This means either call the _tmp_dir and manually clean up or just
duplicate the tmp_dir function but aptly named for session scope. It's a
bit ugly but couldn't find an alternative.

* Can't make tmp_dir for session scope fixtures

Doesn't populate the request.module object if requesting from a session
scope. For now module will have to do

* Reverted back, models trained in test

* Moved `leaderboard` AutoML -> AutoSklearnEstimator

* Added fuzzing test for test_leaderboard

* Added tests for leaderboard, added sort_order

* Removed Type Final to support python 3.7

* Removed old solution to is_classication for leaderboard

* I should really force pre-commit to run before commit (flake8 fixes)

* More occurences of Literal

* Readded Literal but imported from typing_extensions

* Fixed docstring for sphinx

* Added make command to build html without running examples

* Added doc/examples to gitignore

Generating the sphinx examples causes output to be generated in
doc/examples. Not sure if this should be pushed considering docs/build
is not.

* Added leadboard to basic examples

Found a bug:

/home/skantify/code/auto-sklearn/examples/20_basic/example_multilabel_classification.py failed to execute correctly: Traceback (most recent call last):
  File "/home/skantify/code/auto-sklearn/examples/20_basic/example_multilabel_classification.py", line 61, in <module>
    print(automl.leaderboard())
  File "/home/skantify/code/auto-sklearn/autosklearn/estimators.py", line 738, in leaderboard
    model_runs[model_id]['ensemble_weight'] = self.automl_.ensemble_.weights_[i]
KeyError: 2

* Cleaned up _str_ of EnsembleSelection

* Fixed discrepancy between config_id and model_id

There is a discrepency between identifiers used by SMAC and and the identifiers used by an Ensemble class.
SMAC uses `config_id` which is available for every run of SMAC while Ensemble uses `model_id == num_run` which is only available in runinfo.additional_info.
However, this is not always included in additional_info, nor is additional_info garunteed to exist.
Therefore the only garunteed unique identifier for models are `config_id`s which can confuse the user if they wise to interact with the ensembler.

* Readded desired code for design choice on model indexing

There are two indexes that can be used, SMAC uses `config_id` and asklearn
uses `num_run`, these are not garunteed to be equal and also `num_run`
is not always present.

As the user should not care that there is possible 2 indexes for models,
made the choice to show `config_id` as this allows displaying info on
failed runs.

An alternative to show asklearn's `num_run` index is just to exclude any
failed runs from showing up in the leaderboard.

* Removed Literal again as typing_extensions is external module

* Switched to model_id as primary id

Any runs which do not provide a model_id == num_run are essentially
discarded. This hsould change in the future but the fix is outside the
scope of the PR.

* pre-commit flake8 fix

* Logger gives warning if sort_by is not in columns asked for

* Moved column types to static method

* Fixed rank to be based on cost

* Fixed so model_id can be requested, even though it always exists

* Fixed so rank can be calculated even if cost not requested

* Readded Literal and included typing_extension dependancy

Once Python 3.7 is dropped, we can drop typing_extensions

* Changed default sort_order to 'auto'

* Changed leaderboard columns to be static attributes

* Update budget doc

Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de>

* flake8'd

Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de>
---
 .gitignore                                    |   5 +
 autosklearn/automl.py                         |   1 -
 autosklearn/ensembles/ensemble_selection.py   |  22 +-
 autosklearn/estimators.py                     | 272 +++++++++++++++++-
 doc/Makefile                                  |   8 +-
 examples/20_basic/example_classification.py   |   7 +
 .../example_multilabel_classification.py      |   7 +
 .../example_multioutput_regression.py         |   7 +
 examples/20_basic/example_regression.py       |   6 +
 requirements.txt                              |   1 +
 test/conftest.py                              |   7 +-
 test/test_automl/test_estimators.py           | 120 +++++++-
 12 files changed, 444 insertions(+), 19 deletions(-)
diff --git a/.gitignore b/.gitignore
index 92fa37b152..9d2b72965a 100755
--- a/.gitignore
+++ b/.gitignore
@@ -1,8 +1,13 @@
 # Documentation
 docs/build/*
+docs/examples
 
 *.py[cod]
 
+# Exmaples
+# examples 40_advanced generate a tmp_folder
+examples/40_advanced/tmp_folder
+
 # C extensions
 *.c
 *.so
diff --git a/autosklearn/automl.py b/autosklearn/automl.py
index dbea7996a1..bb8f6b5ddb 100644
--- a/autosklearn/automl.py
+++ b/autosklearn/automl.py
@@ -201,7 +201,6 @@ def __init__(self,
         self.cv_models_ = None
         self.ensemble_ = None
         self._can_predict = False
-
         self._debug_mode = debug_mode
 
         self.InputValidator = None  # type: Optional[InputValidator]
diff --git a/autosklearn/ensembles/ensemble_selection.py b/autosklearn/ensembles/ensemble_selection.py
index 363bf000ac..6ef4f42c93 100644
--- a/autosklearn/ensembles/ensemble_selection.py
+++ b/autosklearn/ensembles/ensemble_selection.py
@@ -278,14 +278,20 @@ def predict(self, predictions: Union[np.ndarray, List[np.ndarray]]) -> np.ndarra
         return average
 
     def __str__(self) -> str:
-        return 'Ensemble Selection:\n\tTrajectory: %s\n\tMembers: %s' \
-               '\n\tWeights: %s\n\tIdentifiers: %s' % \
-               (' '.join(['%d: %5f' % (idx, performance)
-                         for idx, performance in enumerate(self.trajectory_)]),
-                self.indices_, self.weights_,
-                ' '.join([str(identifier) for idx, identifier in
-                          enumerate(self.identifiers_)
-                          if self.weights_[idx] > 0]))
+        trajectory_str = ' '.join([
+            f'{id}: {perf:.5f}'
+            for id, perf in enumerate(self.trajectory_)
+        ])
+        identifiers_str = ' '.join([
+            f'{identifier}'
+            for idx, identifier in enumerate(self.identifiers_)
+            if self.weights_[idx] > 0
+        ])
+        return ("Ensemble Selection:\n"
+                f"\tTrajectory: {trajectory_str}\n"
+                f"\tMembers: {self.indices_}\n"
+                f"\tWeights: {self.weights_}\n"
+                f"\tIdentifiers: {identifiers_str}\n")
 
     def get_models_with_weights(
         self,
diff --git a/autosklearn/estimators.py b/autosklearn/estimators.py
index 62b7cd597d..b924644d8a 100644
--- a/autosklearn/estimators.py
+++ b/autosklearn/estimators.py
@@ -1,11 +1,12 @@
 # -*- encoding: utf-8 -*-
-
-from typing import Optional, Dict, List, Tuple, Union
+from typing import Optional, Dict, List, Tuple, Union, Iterable, ClassVar
+from typing_extensions import Literal
 
 from ConfigSpace.configuration_space import Configuration
 import dask.distributed
 import joblib
 import numpy as np
+import pandas as pd
 from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin
 from sklearn.utils.multiclass import type_of_target
 from smac.runhistory.runhistory import RunInfo, RunValue
@@ -21,6 +22,18 @@
 
 
 class AutoSklearnEstimator(BaseEstimator):
+    # Constants used by `def leaderboard` for columns and their sort order
+    _leaderboard_columns: ClassVar[Dict[str, List[str]]] = {
+        "all": [
+            "model_id", "rank", "ensemble_weight", "type", "cost", "duration",
+            "config_id", "train_loss", "seed", "start_time", "end_time",
+            "budget", "status", "data_preprocessors", "feature_preprocessors",
+            "balancing_strategy", "config_origin"
+        ],
+        "simple": [
+            "model_id", "rank", "ensemble_weight", "type", "cost", "duration"
+        ]
+    }
 
     def __init__(
         self,
@@ -550,6 +563,261 @@ def sprint_statistics(self):
         """
         return self.automl_.sprint_statistics()
 
+    def leaderboard(
+        self,
+        detailed: bool = False,
+        ensemble_only: bool = True,
+        top_k: Union[int, Literal['all']] = 'all',
+        sort_by: str = 'cost',
+        sort_order: Literal['auto', 'ascending', 'descending'] = 'auto',
+        include: Optional[Union[str, Iterable[str]]] = None
+    ) -> pd.DataFrame:
+        """ Returns a pandas table of results for all evaluated models.
+
+        Gives an overview of all models trained during the search process along
+        with various statistics about their training.
+
+        The availble statistics are:
+
+        **Simple**:
+
+        * ``"model_id"`` - The id given to a model by ``autosklearn``.
+        * ``"rank"`` - The rank of the model based on it's ``"cost"``.
+        * ``"ensemble_weight"`` - The weight given to the model in the ensemble.
+        * ``"type"`` - The type of classifier/regressor used.
+        * ``"cost"`` - The loss of the model on the validation set.
+        * ``"duration"`` - Length of time the model was optimized for.
+
+        **Detailed**:
+        The detailed view includes all of the simple statistics along with the
+        following.
+
+        * ``"config_id"`` - The id used by SMAC for optimization.
+        * ``"budget"`` - How much budget was allocated to this model.
+        * ``"status"`` - The return status of training the model with SMAC.
+        * ``"train_loss"`` - The loss of the model on the training set.
+        * ``"balancing_strategy"`` - The balancing strategy used for data preprocessing.
+        * ``"start_time"`` - Time the model began being optimized
+        * ``"end_time"`` - Time the model ended being optimized
+        * ``"data_preprocessors"`` - The preprocessors used on the data
+        * ``"feature_preprocessors"`` - The preprocessors for features types
+
+        Parameters
+        ----------
+        detailed: bool = False
+            Whether to give detailed information or just a simple overview.
+
+        ensemble_only: bool = True
+            Whether to view only models included in the ensemble or all models
+            trained.
+
+        top_k: int or "all" = "all"
+            How many models to display.
+
+        sort_by: str = 'cost'
+            What column to sort by. If that column is not present, the
+            sorting defaults to the ``"model_id"`` index column.
+
+        sort_order: "auto" or "ascending" or "descending" = "auto"
+            Which sort order to apply to the ``sort_by`` column. If left
+            as ``"auto"``, it will sort by a sensible default where "better" is
+            on top, otherwise defaulting to the pandas default for
+            `DataFrame.sort_values`_ if there is no obvious "better".
+
+            .. _DataFrame.sort_values: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
+
+        include: Optional[str or Iterable[str]]
+            Items to include, other items not specified will be excluded.
+            The exception is the ``"model_id"`` index column which is always included.
+
+            If left as ``None``, it will resort back to using the ``detailed``
+            param to decide the columns to include.
+
+        Returns
+        -------
+        pd.DataFrame
+            A dataframe of statistics for the models, ordered by ``sort_by``.
+
+        """  # noqa (links are too long)
+        # TODO validate that `self` is fitted. This is required for
+        #      self.ensemble_ to get the identifiers of models it will generate
+        #      weights for.
+        column_types = {
+            'all': AutoSklearnEstimator._leaderboard_columns['all'],
+            'simple': AutoSklearnEstimator._leaderboard_columns['simple'],
+            'detailed': AutoSklearnEstimator._leaderboard_columns['all']
+        }
+
+        # Validation of top_k
+        if (
+            not (isinstance(top_k, str) or isinstance(top_k, int))
+            or (isinstance(top_k, str) and top_k != 'all')
+            or (isinstance(top_k, int) and top_k <= 0)
+        ):
+            raise ValueError(f"top_k={top_k} must be a positive integer or pass"
+                             " `top_k`='all' to view results for all models")
+
+        # Validate columns to include
+        if isinstance(include, str):
+            include = [include]
+
+        if include is not None:
+            columns = [*include]
+
+            # 'model_id' should always be present as it is the unique index
+            # used for pandas
+            if 'model_id' not in columns:
+                columns.append('model_id')
+
+            invalid_include_items = set(columns) - set(column_types['all'])
+            if len(invalid_include_items) != 0:
+                raise ValueError(f"Values {invalid_include_items} are not known"
+                                 f" columns to include, must be contained in "
+                                 f"{column_types['all']}")
+        elif detailed:
+            columns = column_types['all']
+        else:
+            columns = column_types['simple']
+
+        # Validation of sorting
+        if sort_by not in column_types['all']:
+            raise ValueError(f"sort_by='{sort_by}' must be one of included "
+                             f"columns {set(column_types['all'])}")
+
+        valid_sort_orders = ['auto', 'ascending', 'descending']
+        if not (isinstance(sort_order, str) and sort_order in valid_sort_orders):
+            raise ValueError(f"`sort_order` = {sort_order} must be a str in "
+                             f"{valid_sort_orders}")
+
+        # To get all the models that were optmized, we collect what we can from
+        # runhistory first.
+        def has_key(rv, key):
+            return rv.additional_info and key in rv.additional_info
+
+        model_runs = {
+            rval.additional_info['num_run']: {
+                'model_id': rval.additional_info['num_run'],
+                'seed': rkey.seed,
+                'budget': rkey.budget,
+                'duration': rval.time,
+                'config_id': rkey.config_id,
+                'start_time': rval.starttime,
+                'end_time': rval.endtime,
+                'status': str(rval.status),
+                'cost': rval.cost,
+                'train_loss': rval.additional_info['train_loss']
+                if has_key(rval, 'train_loss') else None,
+                'config_origin': rval.additional_info['configuration_origin']
+                if has_key(rval, 'configuration_origin') else None
+            }
+            for rkey, rval in self.automl_.runhistory_.data.items()
+            if has_key(rval, 'num_run')
+        }
+
+        # Next we get some info about the model itself
+        model_class_strings = {
+            AutoMLClassifier: 'classifier',
+            AutoMLRegressor: 'regressor'
+        }
+        model_type = model_class_strings.get(self._get_automl_class(), None)
+        if model_type is None:
+            raise RuntimeError(f"Unknown `automl_class` {self._get_automl_class()}")
+
+        # A dict mapping model ids to their configurations
+        configurations = self.automl_.runhistory_.ids_config
+
+        for model_id, run_info in model_runs.items():
+            config_id = run_info['config_id']
+            run_config = configurations[config_id]._values
+
+            run_info.update({
+                'balancing_strategy': run_config.get('balancing:strategy', None),
+                'type': run_config[f'{model_type}:__choice__'],
+                'data_preprocessors': [
+                    value for key, value in run_config.items()
+                    if 'data_preprocessing' in key and '__choice__' in key
+                ],
+                'feature_preprocessors': [
+                    value for key, value in run_config.items()
+                    if 'feature_preprocessor' in key and '__choice__' in key
+                ]
+            })
+
+        # Get the models ensemble weight if it has one
+        # TODO both implementing classes of AbstractEnsemble have a property
+        #      `identifiers_` and `weights_`, might be good to put it as an
+        #       abstract property
+        # TODO `ensemble_.identifiers_` and `ensemble_.weights_` are loosely
+        #      tied together by ordering, might be better to store as tuple
+        for i, weight in enumerate(self.automl_.ensemble_.weights_):
+            (_, model_id, _) = self.automl_.ensemble_.identifiers_[i]
+            model_runs[model_id]['ensemble_weight'] = weight
+
+        # Filter out non-ensemble members if needed, else fill in a default
+        # value of 0 if it's missing
+        if ensemble_only:
+            model_runs = {
+                model_id: info
+                for model_id, info in model_runs.items()
+                if ('ensemble_weight' in info and info['ensemble_weight'] > 0)
+            }
+        else:
+            for model_id, info in model_runs.items():
+                if 'ensemble_weight' not in info:
+                    info['ensemble_weight'] = 0
+
+        # `rank` relies on `cost` so we include `cost`
+        # We drop it later if it's not requested
+        if 'rank' in columns and 'cost' not in columns:
+            columns = [*columns, 'cost']
+
+        # Finally, convert into a tabular format by converting the dict into
+        # column wise orientation.
+        dataframe = pd.DataFrame({
+            col: [run_info[col] for run_info in model_runs.values()]
+            for col in columns if col != 'rank'
+        })
+
+        # Give it an index, even if not in the `include`
+        dataframe.set_index('model_id', inplace=True)
+
+        # Add the `rank` column if needed, dropping `cost` if it's not
+        # requested by the user
+        if 'rank' in columns:
+            dataframe.sort_values(by='cost', ascending=False, inplace=True)
+            dataframe.insert(column='rank',
+                             value=range(1, len(dataframe) + 1),
+                             loc=list(columns).index('rank'))
+
+            if 'cost' not in columns:
+                dataframe.drop('cost', inplace=True)
+
+        # Decide on the sort order depending on what it gets sorted by
+        descending_columns = ['ensemble_weight', 'duration']
+        if sort_order == 'auto':
+            ascending_param = False if sort_by in descending_columns else True
+        else:
+            ascending_param = False if sort_order == 'descending' else True
+
+        # Sort by the given column name, defaulting to 'model_id' if not present
+        if sort_by not in dataframe.columns:
+            self.automl_._logger.warning(f"sort_by = '{sort_by}' was not present"
+                                         ", defaulting to sort on the index "
+                                         "'model_id'")
+            sort_by = 'model_id'
+
+        dataframe.sort_values(by=sort_by,
+                              ascending=ascending_param,
+                              inplace=True)
+
+        # Lastly, just grab the top_k
+        if top_k == 'all' or top_k >= len(dataframe):
+            top_k = len(dataframe)
+
+        dataframe = dataframe.head(top_k)
+
+        return dataframe
+
     def _get_automl_class(self):
         raise NotImplementedError()
 
diff --git a/doc/Makefile b/doc/Makefile
index 9355370597..55274edc8c 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -19,7 +19,7 @@ ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 # the i18n builder cannot share the environment and doctrees with the others
 I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 
-.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
+.PHONY: help clean html html-noexamples dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
 
 all: html
 
@@ -59,6 +59,12 @@ html:
 	@echo
 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 
+html-noexamples:
+	$(SPHINXBUILD) -D plot_gallery=0 -b html $(ALLSPHINXOPTS) $(SOURCEDIR) $(BUILDDIR)/html
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+
 dirhtml:
 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
 	@echo
diff --git a/examples/20_basic/example_classification.py b/examples/20_basic/example_classification.py
index 38bd328816..86fc09a5f4 100644
--- a/examples/20_basic/example_classification.py
+++ b/examples/20_basic/example_classification.py
@@ -32,6 +32,12 @@
 )
 automl.fit(X_train, y_train, dataset_name='breast_cancer')
 
+############################################################################
+# View the models found by auto-sklearn
+# =====================================
+
+print(automl.leaderboard())
+
 ############################################################################
 # Print the final ensemble constructed by auto-sklearn
 # ====================================================
@@ -44,3 +50,4 @@
 
 predictions = automl.predict(X_test)
 print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
+
diff --git a/examples/20_basic/example_multilabel_classification.py b/examples/20_basic/example_multilabel_classification.py
index 30f9be498b..b46caa2233 100644
--- a/examples/20_basic/example_multilabel_classification.py
+++ b/examples/20_basic/example_multilabel_classification.py
@@ -54,6 +54,13 @@
 )
 automl.fit(X_train, y_train, dataset_name='reuters')
 
+############################################################################
+# View the models found by auto-sklearn
+# =====================================
+
+print(automl.leaderboard())
+
+
 ############################################################################
 # Print the final ensemble constructed by auto-sklearn
 # ====================================================
diff --git a/examples/20_basic/example_multioutput_regression.py b/examples/20_basic/example_multioutput_regression.py
index e79a40f927..5db733da0a 100644
--- a/examples/20_basic/example_multioutput_regression.py
+++ b/examples/20_basic/example_multioutput_regression.py
@@ -35,6 +35,13 @@
 )
 automl.fit(X_train, y_train, dataset_name='synthetic')
 
+############################################################################
+# View the models found by auto-sklearn
+# =====================================
+
+print(automl.leaderboard())
+
+
 ############################################################################
 # Print the final ensemble constructed by auto-sklearn
 # ====================================================
diff --git a/examples/20_basic/example_regression.py b/examples/20_basic/example_regression.py
index f7fc1199ae..adfc390dab 100644
--- a/examples/20_basic/example_regression.py
+++ b/examples/20_basic/example_regression.py
@@ -33,6 +33,12 @@
 )
 automl.fit(X_train, y_train, dataset_name='diabetes')
 
+############################################################################
+# View the models found by auto-sklearn
+# =====================================
+
+print(automl.leaderboard())
+
 ######################################################
 # Print the final ensemble constructed by auto-sklearn
 # ====================================================
diff --git a/requirements.txt b/requirements.txt
index a8df33ca2f..7447ed865d 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,5 @@
 setuptools
+typing_extensions
 
 numpy>=1.9.0
 scipy>=0.14.1,<1.7.0
diff --git a/test/conftest.py b/test/conftest.py
index 4814a8ff7f..10d9f3607d 100644
--- a/test/conftest.py
+++ b/test/conftest.py
@@ -75,11 +75,10 @@ def tmp_dir(request):
 
 
 def _dir_fixture(dir_type, request):
-
     test_dir = os.path.dirname(__file__)
-    dir = os.path.join(
-        test_dir, '.%s__%s__%s' % (dir_type, request.module.__name__, request.node.name)
-    )
+
+    dirname = f".{dir_type}__{request.module.__name__}__{request.node.name}"
+    dir = os.path.join(test_dir, dirname)
 
     for i in range(10):
         if os.path.exists(dir):
diff --git a/test/test_automl/test_estimators.py b/test/test_automl/test_estimators.py
index 2c3375df0d..219981afcc 100644
--- a/test/test_automl/test_estimators.py
+++ b/test/test_automl/test_estimators.py
@@ -1,8 +1,11 @@
+from typing import Type
+
 import copy
 import glob
 import importlib
 import os
 import inspect
+import itertools
 import pickle
 import re
 import sys
@@ -29,9 +32,9 @@
 import autosklearn.pipeline.util as putil
 from autosklearn.ensemble_builder import MODEL_FN_RE
 import autosklearn.estimators  # noqa F401
-from autosklearn.estimators import AutoSklearnEstimator
-from autosklearn.classification import AutoSklearnClassifier
-from autosklearn.regression import AutoSklearnRegressor
+from autosklearn.estimators import (
+    AutoSklearnEstimator, AutoSklearnRegressor, AutoSklearnClassifier
+)
 from autosklearn.metrics import accuracy, f1_macro, mean_squared_error, r2
 from autosklearn.automl import AutoMLClassifier
 from autosklearn.experimental.askl2 import AutoSklearn2Classifier
@@ -317,6 +320,117 @@ def test_cv_results(tmp_dir):
     assert hasattr(cls, 'classes_')
 
 
+@pytest.mark.parametrize('estimator_type,dataset_name', [
+    (AutoSklearnClassifier, 'iris'),
+    (AutoSklearnRegressor, 'boston')
+])
+def test_leaderboard(
+    tmp_dir: str,
+    estimator_type: Type[AutoSklearnEstimator],
+    dataset_name: str
+):
+    # Comprehensive test tasks a substantial amount of time, manually set if
+    # required.
+    MAX_COMBO_SIZE_FOR_INCLUDE_PARAM = 2  # [0, len(valid_columns) + 1]
+
+    X_train, Y_train, _, _ = putil.get_dataset(dataset_name)
+    model = estimator_type(
+        time_left_for_this_task=30,
+        per_run_time_limit=5,
+        tmp_folder=tmp_dir,
+        seed=1
+    )
+    model.fit(X_train, Y_train)
+
+    column_types = {
+        'all': AutoSklearnEstimator._leaderboard_columns['all'],
+        'simple': AutoSklearnEstimator._leaderboard_columns['simple'],
+        'detailed': AutoSklearnEstimator._leaderboard_columns['all']
+    }
+    # Create a dict of all possible param values for each param
+    # with some invalid one's of the incorrect type
+    include_combinations = itertools.chain(
+        itertools.combinations(column_types['all'], item_count)
+        for item_count in range(1, MAX_COMBO_SIZE_FOR_INCLUDE_PARAM)
+    )
+    valid_params = {
+        'detailed': [True, False],
+        'ensemble_only': [True, False],
+        'top_k': [-10, 0, 1, 10, 'all'],
+        'sort_by': [column_types['all'], 'invalid'],
+        'sort_order': ['ascending', 'descending', 'auto', 'invalid', None],
+        'include': itertools.chain([None, 'invalid', 'type'], include_combinations),
+    }
+
+    # Create a generator of all possible combinations of valid_params
+    params_generator = iter(
+        dict(zip(valid_params.keys(), param_values))
+        for param_values in itertools.product(*valid_params.values())
+    )
+
+    for params in params_generator:
+
+        # Invalid top_k should raise an error, is a positive int or 'all'
+        if not (params['top_k'] == 'all' or params['top_k'] > 0):
+            with pytest.raises(ValueError):
+                model.leaderboard(**params)
+
+        # Invalid sort_by column
+        elif params['sort_by'] not in column_types['all']:
+            with pytest.raises(ValueError):
+                model.leaderboard(**params)
+
+        # Shouldn't accept an invalid sort order
+        elif params['sort_order'] not in ['ascending', 'descending', 'auto']:
+            with pytest.raises(ValueError):
+                model.leaderboard(**params)
+
+        # Invalid include item in a list
+        elif params['include'] is not None:
+            # Crash if just a str but invalid column
+            if (
+                isinstance(params['include'], str)
+                and params['include'] not in column_types['all']
+            ):
+                with pytest.raises(ValueError):
+                    model.leaderboard(**params)
+            # Crash if list but contains invalid column
+            elif (
+                not isinstance(params['include'], str)
+                and len(set(params['include']) - set(column_types['all'])) != 0
+            ):
+                with pytest.raises(ValueError):
+                    model.leaderboard(**params)
+
+        # Should run without an error if all params are valid
+        else:
+            # Validate the outputs
+            leaderboard = model.leaderboard(**params)
+
+            # top_k should never be less than the rows given back
+            # It can however be larger
+            if isinstance(params['top_k'], int):
+                assert params['top_k'] >= len(leaderboard)
+
+            # Check the right columns are present and in the right order
+            # The id is set as the index but is not included in pandas columns
+            columns = list(leaderboard.columns)
+            if params['include'] is not None:
+                assert columns == list(params['include'])
+            elif params['detailed']:
+                assert columns == column_types['detailed']
+            else:
+                assert columns == column_types['simple']
+
+            # Ensure that if it's ensemble only
+            # Can only check if 'ensemble_weight' is present
+            if (
+                params['ensemble_only']
+                and 'ensemble_weight' in columns
+            ):
+                assert all(leaderboard['ensemble_weight'] > 0)
+
+
 @unittest.mock.patch('autosklearn.estimators.AutoSklearnEstimator.build_automl')
 def test_fit_n_jobs_negative(build_automl_patch):
     n_cores = cpu_count()

From 53daf7ea4730cda633324497f23421dff5002794 Mon Sep 17 00:00:00 2001
From: Eddie Bergman <eddiebergmanhs@gmail.com>
Date: Tue, 27 Jul 2021 23:23:38 +0200
Subject: [PATCH 2/4] Leaderboard rank fix (#1191)

* Fixes for valid parameters not being tested

* flake8'd
---
 autosklearn/estimators.py           | 53 +++++++++--------
 test/test_automl/test_estimators.py | 90 +++++++++++++++++------------
 2 files changed, 82 insertions(+), 61 deletions(-)

diff --git a/autosklearn/estimators.py b/autosklearn/estimators.py
index b924644d8a..2878a24d70 100644
--- a/autosklearn/estimators.py
+++ b/autosklearn/estimators.py
@@ -1,5 +1,5 @@
 # -*- encoding: utf-8 -*-
-from typing import Optional, Dict, List, Tuple, Union, Iterable, ClassVar
+from typing import Optional, Dict, List, Tuple, Union, Iterable
 from typing_extensions import Literal
 
 from ConfigSpace.configuration_space import Configuration
@@ -22,18 +22,6 @@
 
 
 class AutoSklearnEstimator(BaseEstimator):
-    # Constants used by `def leaderboard` for columns and their sort order
-    _leaderboard_columns: ClassVar[Dict[str, List[str]]] = {
-        "all": [
-            "model_id", "rank", "ensemble_weight", "type", "cost", "duration",
-            "config_id", "train_loss", "seed", "start_time", "end_time",
-            "budget", "status", "data_preprocessors", "feature_preprocessors",
-            "balancing_strategy", "config_origin"
-        ],
-        "simple": [
-            "model_id", "rank", "ensemble_weight", "type", "cost", "duration"
-        ]
-    }
 
     def __init__(
         self,
@@ -642,11 +630,7 @@ def leaderboard(
         # TODO validate that `self` is fitted. This is required for
         #      self.ensemble_ to get the identifiers of models it will generate
         #      weights for.
-        column_types = {
-            'all': AutoSklearnEstimator._leaderboard_columns['all'],
-            'simple': AutoSklearnEstimator._leaderboard_columns['simple'],
-            'detailed': AutoSklearnEstimator._leaderboard_columns['all']
-        }
+        column_types = AutoSklearnEstimator._leaderboard_columns()
 
         # Validation of top_k
         if (
@@ -661,6 +645,9 @@ def leaderboard(
         if isinstance(include, str):
             include = [include]
 
+        if include == ['model_id']:
+            raise ValueError('Must provide more than just `model_id`')
+
         if include is not None:
             columns = [*include]
 
@@ -784,10 +771,10 @@ def has_key(rv, key):
         # Add the `rank` column if needed, dropping `cost` if it's not
         # requested by the user
         if 'rank' in columns:
-            dataframe.sort_values(by='cost', ascending=False, inplace=True)
+            dataframe.sort_values(by='cost', ascending=True, inplace=True)
             dataframe.insert(column='rank',
                              value=range(1, len(dataframe) + 1),
-                             loc=list(columns).index('rank'))
+                             loc=list(columns).index('rank') - 1)  # account for `model_id`
 
             if 'cost' not in columns:
                 dataframe.drop('cost', inplace=True)
@@ -806,9 +793,15 @@ def has_key(rv, key):
                                          "'model_id'")
             sort_by = 'model_id'
 
-        dataframe.sort_values(by=sort_by,
-                              ascending=ascending_param,
-                              inplace=True)
+        # Cost can be the same but leave rank all over the place
+        if 'rank' in columns and sort_by == 'cost':
+            dataframe.sort_values(by=[sort_by, 'rank'],
+                                  ascending=[ascending_param, True],
+                                  inplace=True)
+        else:
+            dataframe.sort_values(by=sort_by,
+                                  ascending=ascending_param,
+                                  inplace=True)
 
         # Lastly, just grab the top_k
         if top_k == 'all' or top_k >= len(dataframe):
@@ -818,6 +811,20 @@ def has_key(rv, key):
 
         return dataframe
 
+    @staticmethod
+    def _leaderboard_columns() -> Dict[Literal['all', 'simple', 'detailed'], List[str]]:
+        all = [
+            "model_id", "rank", "ensemble_weight", "type", "cost", "duration",
+            "config_id", "train_loss", "seed", "start_time", "end_time",
+            "budget", "status", "data_preprocessors", "feature_preprocessors",
+            "balancing_strategy", "config_origin"
+        ]
+        simple = [
+            "model_id", "rank", "ensemble_weight", "type", "cost", "duration"
+        ]
+        detailed = all
+        return {'all': all, 'detailed': detailed, 'simple': simple}
+
     def _get_automl_class(self):
         raise NotImplementedError()
 
diff --git a/test/test_automl/test_estimators.py b/test/test_automl/test_estimators.py
index 219981afcc..94329629b9 100644
--- a/test/test_automl/test_estimators.py
+++ b/test/test_automl/test_estimators.py
@@ -331,22 +331,9 @@ def test_leaderboard(
 ):
     # Comprehensive test tasks a substantial amount of time, manually set if
     # required.
-    MAX_COMBO_SIZE_FOR_INCLUDE_PARAM = 2  # [0, len(valid_columns) + 1]
+    MAX_COMBO_SIZE_FOR_INCLUDE_PARAM = 3  # [0, len(valid_columns) + 1]
+    column_types = AutoSklearnEstimator._leaderboard_columns()
 
-    X_train, Y_train, _, _ = putil.get_dataset(dataset_name)
-    model = estimator_type(
-        time_left_for_this_task=30,
-        per_run_time_limit=5,
-        tmp_folder=tmp_dir,
-        seed=1
-    )
-    model.fit(X_train, Y_train)
-
-    column_types = {
-        'all': AutoSklearnEstimator._leaderboard_columns['all'],
-        'simple': AutoSklearnEstimator._leaderboard_columns['simple'],
-        'detailed': AutoSklearnEstimator._leaderboard_columns['all']
-    }
     # Create a dict of all possible param values for each param
     # with some invalid one's of the incorrect type
     include_combinations = itertools.chain(
@@ -357,7 +344,7 @@ def test_leaderboard(
         'detailed': [True, False],
         'ensemble_only': [True, False],
         'top_k': [-10, 0, 1, 10, 'all'],
-        'sort_by': [column_types['all'], 'invalid'],
+        'sort_by': [*column_types['all'], 'invalid'],
         'sort_order': ['ascending', 'descending', 'auto', 'invalid', None],
         'include': itertools.chain([None, 'invalid', 'type'], include_combinations),
     }
@@ -368,7 +355,19 @@ def test_leaderboard(
         for param_values in itertools.product(*valid_params.values())
     )
 
+    X_train, Y_train, _, _ = putil.get_dataset(dataset_name)
+    model = estimator_type(
+        time_left_for_this_task=30,
+        per_run_time_limit=5,
+        tmp_folder=tmp_dir,
+        seed=1
+    )
+    model.fit(X_train, Y_train)
+
     for params in params_generator:
+        # Convert from iterator to solid list
+        if params['include'] is not None and not isinstance(params['include'], str):
+            params['include'] = list(params['include'])
 
         # Invalid top_k should raise an error, is a positive int or 'all'
         if not (params['top_k'] == 'all' or params['top_k'] > 0):
@@ -385,26 +384,32 @@ def test_leaderboard(
             with pytest.raises(ValueError):
                 model.leaderboard(**params)
 
-        # Invalid include item in a list
-        elif params['include'] is not None:
-            # Crash if just a str but invalid column
-            if (
-                isinstance(params['include'], str)
-                and params['include'] not in column_types['all']
-            ):
-                with pytest.raises(ValueError):
-                    model.leaderboard(**params)
-            # Crash if list but contains invalid column
-            elif (
-                not isinstance(params['include'], str)
-                and len(set(params['include']) - set(column_types['all'])) != 0
-            ):
-                with pytest.raises(ValueError):
-                    model.leaderboard(**params)
+        # include is single str but not valid
+        elif (
+            isinstance(params['include'], str)
+            and params['include'] not in column_types['all']
+        ):
+            with pytest.raises(ValueError):
+                model.leaderboard(**params)
+
+        # Crash if include is list but contains invalid column
+        elif (
+            isinstance(params['include'], list)
+            and len(set(params['include']) - set(column_types['all'])) != 0
+        ):
+            with pytest.raises(ValueError):
+                model.leaderboard(**params)
+
+        # Can't have just model_id, in both single str and list case
+        elif (
+            params['include'] == 'model_id'
+            or params['include'] == ['model_id']
+        ):
+            with pytest.raises(ValueError):
+                model.leaderboard(**params)
 
-        # Should run without an error if all params are valid
+        # Else all valid combinations should be validated
         else:
-            # Validate the outputs
             leaderboard = model.leaderboard(**params)
 
             # top_k should never be less than the rows given back
@@ -413,14 +418,23 @@ def test_leaderboard(
                 assert params['top_k'] >= len(leaderboard)
 
             # Check the right columns are present and in the right order
-            # The id is set as the index but is not included in pandas columns
+            # The model_id is set as the index, not included in pandas columns
             columns = list(leaderboard.columns)
+
+            def exclude(lst, s):
+                return [x for x in lst if x != s]
+
             if params['include'] is not None:
-                assert columns == list(params['include'])
+                # Include with only single str should be the only column
+                if isinstance(params['include'], str):
+                    assert params['include'] in columns and len(columns) == 1
+                # Include as a list should have all the columns without model_id
+                else:
+                    assert columns == exclude(params['include'], 'model_id')
             elif params['detailed']:
-                assert columns == column_types['detailed']
+                assert columns == exclude(column_types['detailed'], 'model_id')
             else:
-                assert columns == column_types['simple']
+                assert columns == exclude(column_types['simple'], 'model_id')
 
             # Ensure that if it's ensemble only
             # Can only check if 'ensemble_weight' is present

From 2ad8da1d0c9f34a4879cc41fa1ce8ee9b9317db1 Mon Sep 17 00:00:00 2001
From: Eddie Bergman <eddiebergmanhs@gmail.com>
Date: Wed, 28 Jul 2021 00:20:19 +0200
Subject: [PATCH 3/4] Asklearn development with smac development test (#1187)

* Changes required to test if will work with smac@development

* Changes required to test if will work with smac@development

* Fixed failing tests with new scipy 1.7 on sparse data

* flake8'd

* Use SMAC from pypi again

* undo changes

Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
---
 doc/releases.rst                              | 32 +++++++++++++++++++
 requirements.txt                              |  4 +--
 .../pyMetaLearn/test_meta_features_sparse.py  | 26 +++++++--------
 3 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/doc/releases.rst b/doc/releases.rst
index 008690f332..7357e1972a 100644
--- a/doc/releases.rst
+++ b/doc/releases.rst
@@ -9,6 +9,38 @@
 Releases
 ========
 
+Version 0.13.0
+==============
+
+* ADD #1100: Provide access to the callbacks of SMAC.
+* ADD #1185: New leaderboard functionality to visualize models
+* FIX #1133: Refer to the correct attribute in an error message.
+* FIX #1154: Allow running Auto-sklearn on a 32-bit system.
+* MAINT #924: Instead of passing classes for the resampling strategy one has now to pass objects.
+* MAINT #1108: Limit the number of threads used by numpy and/or scikit-learn via `threadpoolctl`.
+* MAINT #1135: Simplify internal workflow of pandas handling. This results in pandas being passed
+  directly passed to scikit-learn models instead of being internally converted into a numpy array.
+  However, this should neither impact the behavior nor the performance of Auto-sklearn.
+* MAINT #1157: Drop support for Python 3.6, enable support for Python 3.9.
+* MAINT #1159: Remove the output directory argument to the classifier and regressor. Despite the
+  name, the output directory was not used and was a leftover from participating in the AutoML
+  challenges.
+* MAINT #1187: Bump requires SMAC version to at least 0.14.
+* DOC #1109: Add an FAQ.
+* DOC #1126: Add new examples on how to use scikit-learn's inspect module.
+* DOC #1136: Add a new example on how to perform multi-output regression.
+* DOC #1152: Enable link checking when buiding the documentation.
+* DOC #1158: New example on how to configure the logger for Auto-sklearn.
+* DOC #1165: Improve the readme page.
+
+Contributors v0.13.0
+********************
+
+* Matthias Feurer
+* Eddie Bergman
+* bitsbuffer
+* Francisco Rivera Valverde
+
 Version 0.12.7
 ==============
 
diff --git a/requirements.txt b/requirements.txt
index 7447ed865d..a499bf8321 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -2,7 +2,7 @@ setuptools
 typing_extensions
 
 numpy>=1.9.0
-scipy>=0.14.1,<1.7.0
+scipy>=1.7.0
 
 joblib
 scikit-learn>=0.24.0,<0.25.0
@@ -17,4 +17,4 @@ threadpoolctl
 ConfigSpace>=0.4.14,<0.5
 pynisher>=0.6.3
 pyrfr>=0.8.1,<0.9
-smac>=0.13.1,<0.14
+smac>=0.14
diff --git a/test/test_metalearning/pyMetaLearn/test_meta_features_sparse.py b/test/test_metalearning/pyMetaLearn/test_meta_features_sparse.py
index 18b795cb45..6296ad23d8 100644
--- a/test/test_metalearning/pyMetaLearn/test_meta_features_sparse.py
+++ b/test/test_metalearning/pyMetaLearn/test_meta_features_sparse.py
@@ -256,12 +256,12 @@ def test_symbols_sum(sparse_data):
 
 def test_skewnesses(sparse_data_transformed):
     X_transformed, y, categorical_transformed = sparse_data_transformed
-    fixture = [0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
-               1.0, 0.0, -1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0,
-               0.0, 0.0, -1.0, 0.0, 0.0, 0.0,
-               -0.6969708499033568, 0.626346013011263,
-               0.3809987596624038, 1.4762248835141034,
-               0.07687661087633726, 0.36889797830360116]
+    fixture = [
+        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
+        -0.696970849903357, 0.626346013011262, 0.38099875966240554,
+        1.4762248835141032, 0.07687661087633788, 0.3688979783036015
+    ]
     mf = meta_features.helper_functions["Skewnesses"](X_transformed, y, logging.getLogger('Meta'))
     print(mf.value)
     print(fixture)
@@ -269,15 +269,15 @@ def test_skewnesses(sparse_data_transformed):
 
 
 def test_kurtosisses(sparse_data_transformed):
-    fixture = [-3.0, -3.0, -2.0, -2.0, -3.0, -3.0, -3.0, -3.0,
-               -3.0, -2.0, -3.0, -2.0, -3.0, -3.0, -2.0, -3.0,
-               -3.0, -3.0, -3.0, -3.0, -3.0, -2.0, -3.0,
-               -3.0, -3.0, -1.1005836114255765,
-               -1.1786325509475712, -1.2387998382327912,
-               1.393438264413704, -0.9768209837948336,
-               -1.7937072296512782]
+    fixture = [
+        -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0,
+        -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0,
+        -3.0, -1.1005836114255763, -1.1786325509475744, -1.23879983823279,
+        1.3934382644137013, -0.9768209837948336, -1.7937072296512784
+    ]
     X_transformed, y, categorical_transformed = sparse_data_transformed
     mf = meta_features.helper_functions["Kurtosisses"](X_transformed, y, logging.getLogger('Meta'))
+    print(mf.value)
     np.testing.assert_allclose(mf.value, fixture)
 
 

From 5dd38be0be443aee65346e7e5fc622c5c6d23ea2 Mon Sep 17 00:00:00 2001
From: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
Date: Wed, 28 Jul 2021 00:20:48 +0200
Subject: [PATCH 4/4] bump version number for new release

---
 autosklearn/__version__.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/autosklearn/__version__.py b/autosklearn/__version__.py
index d2d1825e32..7b8cb709f9 100644
--- a/autosklearn/__version__.py
+++ b/autosklearn/__version__.py
@@ -1,4 +1,4 @@
 """Version information."""
 
 # The following line *must* be the last in the module, exactly as formatted:
-__version__ = "0.12.7"
+__version__ = "0.13.0"