Releases: rasbt/mlxtend
Releases · rasbt/mlxtend
Version 0.9.1
Version 0.9.1 (2017-11-19)
Downloads
New Features
- Added
mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283) - New
max_len
parameter for the frequent itemset generation via theapriori
function to allow for early stopping. (#270)
Changes
- All feature index tuples in
SequentialFeatureSelector
or now in sorted order. (#262) - The
SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
(#262) utils.Counter
now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
Bug Fixes
- Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)
Version 0.9.0
New Features
- Added
evaluate.permutation_test
, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250) - Added
'leverage'
and'conviction
as evaluation metrics to thefrequent_patterns.association_rules
function. (#246 & #247) - Added a
loadings_
attribute toPrincipalComponentAnalysis
to compute the factor loadings of the features on the principal components. (#251) - Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
- New
make_multiplexer_dataset
function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263) - Added a new
BootstrapOutOfBag
class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265) - The parameters for
StackingClassifier
,StackingCVClassifier
,StackingRegressor
,StackingCVRegressor
, andEnsembleVoteClassifier
can now be tuned using scikit-learn'sGridSearchCV
(#254 via James Bourbeau)
Changes
- The
'support'
column returned byfrequent_patterns.association_rules
was changed to compute the support of "antecedant union consequent", and newantecedant support'
and'consequent support'
column were added to avoid ambiguity. (#245) - Allow the
OnehotTransactions
to be cloned via scikit-learn'sclone
function, which is required by e.g., scikit-learn'sFeatureUnion
orGridSearchCV
(via Iaroslav Shcherbatyi). (#249)
Bug Fixes
- Fix issues with
self._init_time
parameter in_IterativeModel
subclasses. (#256) - Fix imprecision bug that occurred in
plot_ecdf
when run on Python 2.7. (264) - The vectors from SVD in
PrincipalComponentAnalysis
are no being scaled so that the eigenvalues viasolver='eigen'
andsolver='svd'
now store eigenvalues that have the same magnitudes. (#251)
Version 0.8.0
Downloads
New Features
- Added a
mlxtend.evaluate.bootstrap
that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232 SequentialFeatureSelecor
'sk_features
now accepts a string argument "best" or "parsimonious" for more "automated" feature selection. For instance, if "best" is provided, the feature selector will return the feature subset with the best cross-validation performance. If "parsimonious" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
Changes
SequentialFeatureSelector
now usesnp.nanmean
over normal mean to support scorers that may returnnp.nan
#211 (via mrkaiser)- The
skip_if_stuck
parameter was removed fromSequentialFeatureSelector
in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237 ExhaustiveFeatureSelector
was modified to consume substantially less memory #195 (via Adam Erickson)
Bug Fixes
- Fixed a bug where the
SequentialFeatureSelector
selected a feature subset larger than then specified via thek_features
tuple max-value #213
Version 0.7.0
Version 0.7.0 (2017-06-22)
New Features
- New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions (#196).
- New
StackingCVRegressor
for stacking regressors with out-of-fold predictions to prevent overfitting (#201via Eike Dehling).
Changes
- The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete.
plot_decision_regions
now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).- Parallel execution in
mlxtend.feature_selection.SequentialFeatureSelector
andmlxtend.feature_selection.ExhaustiveFeatureSelector
is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman). - Raise meaningful error messages if pandas
DataFrame
s or Python lists of lists are fed into theStackingCVClassifer
as afit
arguments (198). - The
n_folds
parameter of theStackingCVClassifier
was changed tocv
and can now accept any kind of cross validation technique that is available from scikit-learn. For example,StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3))
orStackingCVClassifier(..., cv=GroupKFold(n_splits=3))
(#203, via Konstantinos Paliouras).
Bug Fixes
SequentialFeatureSelector
now correctly accepts aNone
argument for thescoring
parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).- The
plot_decision_regions
function now supports pre-existing axes objects generated via matplotlib'splt.subplots
. (#184, see example) - Made
math.num_combinations
andmath.num_permutations
numerically stable for large numbers of combinations and permutations (#200).
Version 0.6.0
Version 0.6.0 (2017-03-18)
Downloads
New Features
- An
association_rules
function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
Changes
- Adds a black
edgecolor
to plots viaplotting.plot_decision_regions
to make markers more distinguishable from the background inmatplotlib>=2.0
. - The
association
submodule was renamed tofrequent_patterns
.
Bug Fixes
- The
DataFrame
index ofapriori
results are now unique and ordered.
Version 0.5.1
Version 0.5.1 (2017-02-14)
The CHANGELOG for the current development version is available at
https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.
New Features
- The
EnsembleVoteClassifier
has a newrefit
attribute that prevents refitting classifiers ifrefit=False
to save computational time. - Added a new
lift_score
function inevaluate
to compute lift score (via Batuhan Bardak). StackingClassifier
andStackingRegressor
support multivariate targets if the underlying models do (via kernc).StackingClassifier
has a newuse_features_in_secondary
attribute likeStackingCVClassifier
.
Changes
- Changed default verbosity level in
SequentialFeatureSelector
to 0 - The
EnsembleVoteClassifier
now raises aNotFittedError
if the estimator wasn'tfit
before callingpredict
. (via Anton Loss) - Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0
Bug Fixes
- Fixed wrong default value for
k_features
inSequentialFeatureSelector
- Cast selected feature subsets in the
SequentialFeautureSelector
as sets to prevent the iterator from getting stuck if thek_idx
are different permutations of the same combination (via Zac Wellmer). - Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko)
- Fixed a bug that could occur in the
SequentialFeatureSelector
if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).
v0.5.0
Version 0.5.0
Downloads
New Features
- New
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
for evaluating all feature combinations in a specified range - The
StackingClassifier
has a new parameteraverage_probas
that is set toTrue
by default to maintain the current behavior. A deprecation warning was added though, and it will default toFalse
in future releases (0.6.0);average_probas=False
will result in stacking of the level-1 predicted probabilities rather than averaging these. - New
StackingCVClassifier
estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano) - New
OnehotTransactions
encoder class added to thepreprocessing
submodule for transforming transaction data into a one-hot encoded array - The
SequentialFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter (Will McGinnis) - New
apriori
function inassociation
to extract frequent itemsets from transaction data for association rule mining - New
checkerboard_plot
function inplotting
to plot checkerboard tables / heat maps - New
mcnemar_table
andmcnemar
functions inevaluate
to compute 2x2 contingency tables and McNemar's test
Changes
- All plotting functions have been moved to
mlxtend.plotting
for compatibility reasons with continuous integration services and to make the installation ofmatplotlib
optional for users ofmlxtend
's core functionality - Added a compatibility layer for
scikit-learn 0.18
using the newmodel_selection
module while maintaining backwards compatibility to scikit-learn 0.17.
Bug Fixes
mlxtend.plotting.plot_decision_regions
now draws decision regions correctly if more than 4 class labels are present- Raise
AttributeError
inplot_decision_regions
when theX_higlight
argument is a 1D array (chkoar)
v0.4.2
Version 0.4.2 (2016-08-24)
New Features
- Added
preprocessing.CopyTransformer
, a mock class that returns copies of
imput arrays viatransform
andfit_transform
Changes
- Added AppVeyor to CI to ensure MS Windows compatibility
- Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects
feature_selection.SequentialFeatureSelector
now supports the selection ofk_features
using a tuple to specify a "min-max"k_features
range- Added "SVD solver" option to the
PrincipalComponentAnalysis
- Raise a
AttributeError
with "not fitted" message inSequentialFeatureSelector
iftransform
orget_metric_dict
are called prior tofit
- Use small, positive bias units in
TfMultiLayerPerceptron
's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons - Added an optional
clone_estimator
parameter to theSequentialFeatureSelector
that defaults toTrue
, avoiding the modification of the original estimator objects - More rigorous type and shape checks in the
evaluate.plot_decision_regions
function DenseTransformer
now doesn't raise and error if the input array is not sparse- API clean-up using scikit-learn's
BaseEstimator
as parent class forfeature_selection.ColumnSelector
Bug Fixes
- Fixed a problem when a tuple-range was provided as argument to the
SequentialFeatureSelector
'sk_features
parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) via wahutch - Fixed an
AttributeError
issue whenverbose
> 1 inStackingClassifier
- Fixed a bug in
classifier.SoftmaxRegression
where the mean values of the offsets were used to update the bias units rather than their sum - Fixed rare bug in MLP
_layer_mapping
functions that caused a swap between the random number generation seed when initializing weights and biases
v0.4.1
Version 0.4.1 (2016-05-01)
New Features
- New TensorFlow estimator for Linear Regression (
tf_regressor.TfLinearRegression
) - New k-means clustering estimator (
cluster.Kmeans
) - New TensorFlow k-means clustering estimator (
tf_cluster.Kmeans
)
Changes
- Due to refactoring of the estimator classes, the
init_weights
parameter of thefit
methods was globally renamed toinit_params
- Overall performance improvements of estimators due to code clean-up and refactoring
- Added several additional checks for correct array types and more meaningful exception messages
- Added optional
dropout
to thetf_classifier.TfMultiLayerPerceptron
classifier for regularization - Added an optional
decay
parameter to thetf_classifier.TfMultiLayerPerceptron
classifier for adaptive learning via an exponential decay of the learning rate eta - Replaced old
NeuralNetMLP
by more streamlinedMultiLayerPerceptron
(classifier.MultiLayerPerceptron
); now also with softmax in the output layer and categorical cross-entropy loss. - Unified
init_params
parameter for fit functions to continue training where the algorithm left off (if supported)
v0.3.0
Version 0.3.0 (2016-01-31)
- The
mlxtend.preprocessing.standardize
function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes thestandardize
function smarter in order to avoid zero-division errors - Added a progress bar tracker to
classifier.NeuralNetMLP
- Added a function to score predicted vs. target class labels
evaluate.scoring
- Added confusion matrix functions to create (
evaluate.confusion_matrix
) and plot (evaluate.plot_confusion_matrix
) confusion matrices - Cosmetic improvements to the
evaluate.plot_decision_regions
function such as hiding plot axes - Renaming of
classifier.EnsembleClassfier
toclassifier.EnsembleVoteClassifier
- Improved random weight initialization in
Perceptron
,Adaline
,LinearRegression
, andLogisticRegression
- Changed
learning
parameter ofmlxtend.classifier.Adaline
to solver and added "normal equation" as closed-form solution solver - New style parameter and improved axis scaling in
mlxtend.evaluate.plot_learning_curves
- Hide y-axis labels in
mlxtend.evaluate.plot_decision_regions
in 1 dimensional evaluations - Added
loadlocal_mnist
tomlxtend.data
for streaming MNIST from a local byte files into numpy arrays - New
NeuralNetMLP
parameters:random_weights
,shuffle_init
,shuffle_epoch
- Sequential Feature Selection algorithms were unified into a single
SequentialFeatureSelector
class with parameters to enable floating selection and toggle between forward and backward selection. - New
SFS
features such as the generation of pandasDataFrame
results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars) - Added support for regression estimators in
SFS
- Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories)
- Added Boston
housing dataset
- Renaming
mlxtend.plotting
tomlxtend.general_plotting
in order to distinguish general plotting function from specialized utility function such asevaluate.plot_decision_regions
- Shuffle fix and new shuffle parameter for classifier.NeuralNetMLP