GitHub

This is a machine learning tools with pipeline.

Here some example to run machine learning project:

from ngocbienml import MyPipeline
pipeline = MyPipeline()
pipeline.fit(data, target)
pipeline.score(new_data, new_target)

Or

from ngocbienml import PipelineKfold
pipline = PipelineKfold()
pipline.fit(data, target)
pipline.score(new_data, new_target)

Where data is pandas dataframe, target is series object. In the above default settting, principal modules of pipeline are:

Fillna by mean
LabelEncoder
Feature Selection: Use 2 methods: variance and correlation
MinMaxScale
LGBClassifier: The default params work well with dataset of 100K rows or more, with minimum of 20 features. It deals well with unbalanced dataset. In the above default setting 10% of dataset will be cut for test set if not using kfold or 5 folds in other case.

You can use to save and reload pipeline for a long usage.

from joblib import dump, load
dump(pipeline, path)
pipeline = load(path)
pipeline.score(data, target)

You can use include many preprocessing classes like Fillna, Scale, or Labelencoder in your customized pipeline. Note that actually, you can not use full label encoder by sklearn

from ngocbienml import Scale, Fillna, Labelencoder, ModelWithPipeline
from sklearn.pipeline import Pipeline
pipeline = Pipeline([('label_encoder', Labelencoder()),
                    ('fillna', Fillna()), 
                    ('scale', Scale()),
                    ('model', ModelWithPipeline())])
pipeline.fit(data, target)
pipline.score(test,  y_test)

from ngocbienml import PipelineKfold
pipeline = PipelineKfold()
pipeline.fit(data, target)
pipline.score(test, y_test)

We can use only pipeline to transform data, and then use it for other task

from ngocbienml import AssertGoodHeader, Fillna, LabelEncoder,\
FeatureSelection, FillnaAndDropCatFeat, MinMaxScale
from sklearn.pipeline import Pipeline
steps = [('assertGoodheader', AssertGoodHeader()),
        ('Fillna', Fillna()),
        ('LabelEncoder', LabelEncoder()),
        ('FillnaAndDropCatFeat', FillnaAndDropCatFeat()),
        ('MinMaxScale', MinMaxScale()),
        ('FeatureSelection', FeatureSelection())]

pipline = Pipeline(steps = steps)
df_transformed = pipline.fit_transform(df)

Or the simplest way is to use the default params

from ngocbienml import MyPipeline
pipline = Pipeline(model_name=None) #do not use model
df_transformed = pipline.fit_transform(df)

In the above code, df_tranformed is numeric data frame with the same header of df. df_transformed is ready to train by any model.

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(df_transformed, label)

You can run deeplearning, for example

classifier = MyPipeline( model_name='dl', 
                             epochs=3000,
                             hidden_layers=[64, 32], 
                             activation=['relu', 'sigmoid'],
                             dropout=.1)
classifier.fit(data, target)

model_name is dl or lgb or logistic regression. The default is lgb.

Use search cv for hyper params tuning:

from ngocbienml import SearchCv
SearchCv(n_iter=100).fit(data, target)

This tool will break down n_iter to small step and save at the and of these step, to ensure that you do not loss everything if you shut down your PC before the end of running. You can re-runing this to refit and fit the better params

What's next:

More setting in feature extraction and modelling.
More metric and visualization

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
ngocbienml		ngocbienml
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

ngocbien/ngocbienml

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages