Klearn: Data Science and Machine Learning Tool Kits for Kagglers

Good Job!!! I am glad that you just found Klearn.

Klearn is a Python module that speeds up data science or machine learning research work flow tremendously. It embraces the best data science practices and commits to empower data scientists. It holds several data science most-use modules, which includes but not limit to EDA module, feature engineering module, cross-validation strategy, hold-out data scoring, and model ensembling.

Klearn is compatible with: Python 2.7-3.6.

Some principles

User friendliness. Klearn is designed for data science beginners. Klearn follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable feedback upon user error.
Modularity. A data science research project is understood as a sequence of tasks including EDA, feature engineering, and model selection/benchmarking. Each module in Klearn is reponsible for each task in data scientist research routine work flow.
Easy extensibility. New modules are simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Klearn suitable for advanced research.
Work with Python. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.

Module structure

The main modules of Klearn API are:

datasets, which is responsible for dumping data in certain format
eda, which is responsible for data visualization and exploratory analysis
ensemble, which is reponsible for combining models together
model_selection, which holds cv strategy classes and scoring functions
models, which is for higher level wrappers of machine learning models
preprocessing, which responsible for data cleaning and feature engineering

The complete file-structure for the project is as follows:

klearn/
    klearn/
        datasets/
            libffm_format.py
        eda/
            eda.py
            plotly.py
            seaborn.py
        ensemble/
            dispatch.py
            ensemble.py
        model_selection/
            metrics.py
            scorers.py
            split.py
        models/
            modifiers.py
            trainers.py
            transformers.py
        preprocessing/
            cleaners.py
            features.py
            targets.py
        logger.py
        utils.py
    images/
        ...random stuff

    README.md
    LICENSE
    requirements.txt
    setup.py

Installation

Install Klearn from PyPI (NOT supported for now):

sudo pip install klearn

Alternatively: install Klearn from the GitHub source (recommended):

First, clone Klearn using git:

git clone https://github.com/KevinLiao159/klearn.git

Then, cd to the Klearn folder and run the install command:

cd klearn
sudo python setup.py install

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Klearn: Data Science and Machine Learning Tool Kits for Kagglers

Good Job!!! I am glad that you just found Klearn.

Some principles

Module structure

Installation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
docker		docker
images		images
klearn		klearn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

KevinLiao159/klearn

Folders and files

Latest commit

History

Repository files navigation

Klearn: Data Science and Machine Learning Tool Kits for Kagglers

Good Job!!! I am glad that you just found Klearn.

Some principles

Module structure

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages