Klearn is a Python module that speeds up data science or machine learning research work flow tremendously. It embraces the best data science practices and commits to empower data scientists. It holds several data science most-use modules, which includes but not limit to EDA module, feature engineering module, cross-validation strategy, hold-out data scoring, and model ensembling.
Klearn is compatible with: Python 2.7-3.6.
-
User friendliness. Klearn is designed for data science beginners. Klearn follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable feedback upon user error.
-
Modularity. A data science research project is understood as a sequence of tasks including EDA, feature engineering, and model selection/benchmarking. Each module in Klearn is reponsible for each task in data scientist research routine work flow.
-
Easy extensibility. New modules are simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Klearn suitable for advanced research.
-
Work with Python. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.
The main modules of Klearn API are:
datasets
, which is responsible for dumping data in certain formateda
, which is responsible for data visualization and exploratory analysisensemble
, which is reponsible for combining models togethermodel_selection
, which holds cv strategy classes and scoring functionsmodels
, which is for higher level wrappers of machine learning modelspreprocessing
, which responsible for data cleaning and feature engineering
The complete file-structure for the project is as follows:
klearn/
klearn/
datasets/
libffm_format.py
eda/
eda.py
plotly.py
seaborn.py
ensemble/
dispatch.py
ensemble.py
model_selection/
metrics.py
scorers.py
split.py
models/
modifiers.py
trainers.py
transformers.py
preprocessing/
cleaners.py
features.py
targets.py
logger.py
utils.py
images/
...random stuff
README.md
LICENSE
requirements.txt
setup.py
- Install Klearn from PyPI (NOT supported for now):
sudo pip install klearn
- Alternatively: install Klearn from the GitHub source (recommended):
First, clone Klearn using git
:
git clone https://github.com/KevinLiao159/klearn.git
Then, cd
to the Klearn folder and run the install command:
cd klearn
sudo python setup.py install