Skip to content

Releases: rasbt/mlxtend

Version 0.9.1

19 Nov 07:24
6610cb7
Compare
Choose a tag to compare

Version 0.9.1 (2017-11-19)

Downloads
New Features
  • Added mlxtend.evaluate.bootstrap_point632_score to evaluate the performance of estimators using the .632 bootstrap. (#283)
  • New max_len parameter for the frequent itemset generation via the apriori function to allow for early stopping. (#270)
Changes
  • All feature index tuples in SequentialFeatureSelector or now in sorted order. (#262)
  • The SequentialFeatureSelector now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
    Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
    (#262)
  • utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
Bug Fixes
  • Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)

Version 0.9.0

22 Oct 00:31
Compare
Choose a tag to compare
New Features
  • Added evaluate.permutation_test, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250)
  • Added 'leverage' and 'conviction as evaluation metrics to the frequent_patterns.association_rules function. (#246 & #247)
  • Added a loadings_ attribute to PrincipalComponentAnalysis to compute the factor loadings of the features on the principal components. (#251)
  • Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
  • New make_multiplexer_dataset function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263)
  • Added a new BootstrapOutOfBag class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265)
  • The parameters for StackingClassifier, StackingCVClassifier, StackingRegressor, StackingCVRegressor, and EnsembleVoteClassifier can now be tuned using scikit-learn's GridSearchCV (#254 via James Bourbeau)
Changes
  • The 'support' column returned by frequent_patterns.association_rules was changed to compute the support of "antecedant union consequent", and new antecedant support' and 'consequent support' column were added to avoid ambiguity. (#245)
  • Allow the OnehotTransactions to be cloned via scikit-learn's clone function, which is required by e.g., scikit-learn's FeatureUnion or GridSearchCV (via Iaroslav Shcherbatyi). (#249)
Bug Fixes
  • Fix issues with self._init_time parameter in _IterativeModel subclasses. (#256)
  • Fix imprecision bug that occurred in plot_ecdf when run on Python 2.7. (264)
  • The vectors from SVD in PrincipalComponentAnalysis are no being scaled so that the eigenvalues via solver='eigen' and solver='svd' now store eigenvalues that have the same magnitudes. (#251)

Version 0.8.0

09 Sep 08:47
Compare
Choose a tag to compare
Downloads
New Features
  • Added a mlxtend.evaluate.bootstrap that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232
  • SequentialFeatureSelecor's k_features now accepts a string argument "best" or "parsimonious" for more "automated" feature selection. For instance, if "best" is provided, the feature selector will return the feature subset with the best cross-validation performance. If "parsimonious" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
Changes
  • SequentialFeatureSelector now uses np.nanmean over normal mean to support scorers that may return np.nan #211 (via mrkaiser)
  • The skip_if_stuck parameter was removed from SequentialFeatureSelector in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237
  • ExhaustiveFeatureSelector was modified to consume substantially less memory #195 (via Adam Erickson)
Bug Fixes
  • Fixed a bug where the SequentialFeatureSelector selected a feature subset larger than then specified via the k_features tuple max-value #213

Version 0.7.0

23 Jun 03:36
Compare
Choose a tag to compare

Version 0.7.0 (2017-06-22)

New Features
Changes
  • The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete.
  • plot_decision_regions now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).
  • Parallel execution in mlxtend.feature_selection.SequentialFeatureSelector and mlxtend.feature_selection.ExhaustiveFeatureSelector is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman).
  • Raise meaningful error messages if pandas DataFrames or Python lists of lists are fed into the StackingCVClassifer as a fit arguments (198).
  • The n_folds parameter of the StackingCVClassifier was changed to cv and can now accept any kind of cross validation technique that is available from scikit-learn. For example, StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3)) or StackingCVClassifier(..., cv=GroupKFold(n_splits=3)) (#203, via Konstantinos Paliouras).
Bug Fixes
  • SequentialFeatureSelector now correctly accepts a None argument for the scoring parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).
  • The plot_decision_regions function now supports pre-existing axes objects generated via matplotlib's plt.subplots. (#184, see example)
  • Made math.num_combinations and math.num_permutations numerically stable for large numbers of combinations and permutations (#200).

Version 0.6.0

18 Mar 22:52
Compare
Choose a tag to compare

Version 0.6.0 (2017-03-18)

Downloads
New Features
  • An association_rules function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
Changes
  • Adds a black edgecolor to plots via plotting.plot_decision_regions to make markers more distinguishable from the background in matplotlib>=2.0.
  • The association submodule was renamed to frequent_patterns.
Bug Fixes
  • The DataFrame index of apriori results are now unique and ordered.

Version 0.5.1

14 Feb 06:26
Compare
Choose a tag to compare

Version 0.5.1 (2017-02-14)

The CHANGELOG for the current development version is available at
https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.

New Features
  • The EnsembleVoteClassifier has a new refit attribute that prevents refitting classifiers if refit=False to save computational time.
  • Added a new lift_score function in evaluate to compute lift score (via Batuhan Bardak).
  • StackingClassifier and StackingRegressor support multivariate targets if the underlying models do (via kernc).
  • StackingClassifier has a new use_features_in_secondary attribute like StackingCVClassifier.
Changes
  • Changed default verbosity level in SequentialFeatureSelector to 0
  • The EnsembleVoteClassifier now raises a NotFittedError if the estimator wasn't fit before calling predict. (via Anton Loss)
  • Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0
Bug Fixes
  • Fixed wrong default value for k_features in SequentialFeatureSelector
  • Cast selected feature subsets in the SequentialFeautureSelector as sets to prevent the iterator from getting stuck if the k_idx are different permutations of the same combination (via Zac Wellmer).
  • Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko)
  • Fixed a bug that could occur in the SequentialFeatureSelector if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).

v0.5.0

11 Nov 07:20
Compare
Choose a tag to compare

Version 0.5.0

Downloads
New Features
  • New ExhaustiveFeatureSelector estimator in mlxtend.feature_selection for evaluating all feature combinations in a specified range
  • The StackingClassifier has a new parameter average_probas that is set to True by default to maintain the current behavior. A deprecation warning was added though, and it will default to False in future releases (0.6.0); average_probas=False will result in stacking of the level-1 predicted probabilities rather than averaging these.
  • New StackingCVClassifier estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano)
  • New OnehotTransactions encoder class added to the preprocessing submodule for transforming transaction data into a one-hot encoded array
  • The SequentialFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter (Will McGinnis)
  • New apriori function in association to extract frequent itemsets from transaction data for association rule mining
  • New checkerboard_plot function in plotting to plot checkerboard tables / heat maps
  • New mcnemar_table and mcnemar functions in evaluate to compute 2x2 contingency tables and McNemar's test
Changes
  • All plotting functions have been moved to mlxtend.plotting for compatibility reasons with continuous integration services and to make the installation of matplotlib optional for users of mlxtend's core functionality
  • Added a compatibility layer for scikit-learn 0.18 using the new model_selection module while maintaining backwards compatibility to scikit-learn 0.17.
Bug Fixes
  • mlxtend.plotting.plot_decision_regions now draws decision regions correctly if more than 4 class labels are present
  • Raise AttributeError in plot_decision_regions when the X_higlight argument is a 1D array (chkoar)

v0.4.2

25 Aug 02:43
Compare
Choose a tag to compare

Version 0.4.2 (2016-08-24)

New Features
  • Added preprocessing.CopyTransformer, a mock class that returns copies of
    imput arrays via transform and fit_transform
Changes
  • Added AppVeyor to CI to ensure MS Windows compatibility
  • Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects
  • feature_selection.SequentialFeatureSelector now supports the selection of k_features using a tuple to specify a "min-max" k_features range
  • Added "SVD solver" option to the PrincipalComponentAnalysis
  • Raise a AttributeError with "not fitted" message in SequentialFeatureSelector if transform or get_metric_dict are called prior to fit
  • Use small, positive bias units in TfMultiLayerPerceptron's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons
  • Added an optional clone_estimator parameter to the SequentialFeatureSelector that defaults to True, avoiding the modification of the original estimator objects
  • More rigorous type and shape checks in the evaluate.plot_decision_regions function
  • DenseTransformer now doesn't raise and error if the input array is not sparse
  • API clean-up using scikit-learn's BaseEstimator as parent class for feature_selection.ColumnSelector
Bug Fixes
  • Fixed a problem when a tuple-range was provided as argument to the SequentialFeatureSelector's k_features parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) via wahutch
  • Fixed an AttributeError issue when verbose > 1 in StackingClassifier
  • Fixed a bug in classifier.SoftmaxRegression where the mean values of the offsets were used to update the bias units rather than their sum
  • Fixed rare bug in MLP _layer_mapping functions that caused a swap between the random number generation seed when initializing weights and biases

v0.4.1

02 May 00:17
Compare
Choose a tag to compare

Version 0.4.1 (2016-05-01)

New Features
Changes
  • Due to refactoring of the estimator classes, the init_weights parameter of the fit methods was globally renamed to init_params
  • Overall performance improvements of estimators due to code clean-up and refactoring
  • Added several additional checks for correct array types and more meaningful exception messages
  • Added optional dropout to the tf_classifier.TfMultiLayerPerceptron classifier for regularization
  • Added an optional decay parameter to the tf_classifier.TfMultiLayerPerceptron classifier for adaptive learning via an exponential decay of the learning rate eta
  • Replaced old NeuralNetMLP by more streamlined MultiLayerPerceptron (classifier.MultiLayerPerceptron); now also with softmax in the output layer and categorical cross-entropy loss.
  • Unified init_params parameter for fit functions to continue training where the algorithm left off (if supported)

v0.3.0

01 Feb 01:03
Compare
Choose a tag to compare

Version 0.3.0 (2016-01-31)

  • The mlxtend.preprocessing.standardize function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes the standardize function smarter in order to avoid zero-division errors
  • Added a progress bar tracker to classifier.NeuralNetMLP
  • Added a function to score predicted vs. target class labels evaluate.scoring
  • Added confusion matrix functions to create (evaluate.confusion_matrix) and plot (evaluate.plot_confusion_matrix) confusion matrices
  • Cosmetic improvements to the evaluate.plot_decision_regions function such as hiding plot axes
  • Renaming of classifier.EnsembleClassfier to classifier.EnsembleVoteClassifier
  • Improved random weight initialization in Perceptron, Adaline, LinearRegression, and LogisticRegression
  • Changed learning parameter of mlxtend.classifier.Adaline to solver and added "normal equation" as closed-form solution solver
  • New style parameter and improved axis scaling in mlxtend.evaluate.plot_learning_curves
  • Hide y-axis labels in mlxtend.evaluate.plot_decision_regions in 1 dimensional evaluations
  • Added loadlocal_mnist to mlxtend.data for streaming MNIST from a local byte files into numpy arrays
  • New NeuralNetMLP parameters: random_weights, shuffle_init, shuffle_epoch
  • Sequential Feature Selection algorithms were unified into a single SequentialFeatureSelector class with parameters to enable floating selection and toggle between forward and backward selection.
  • New SFS features such as the generation of pandas DataFrame results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars)
  • Added support for regression estimators in SFS
  • Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories)
  • Added Boston housing dataset
  • Renaming mlxtend.plotting to mlxtend.general_plotting in order to distinguish general plotting function from specialized utility function such as evaluate.plot_decision_regions
  • Shuffle fix and new shuffle parameter for classifier.NeuralNetMLP