Skip to content
/ MLlab Public

An experimental platform-independent machine learning library

Notifications You must be signed in to change notification settings

andb0t/MLlab

Repository files navigation

MLlab

Build status

This is an experimental platform-independent machine learning library. Born from the desire to implement modern machine learning algorithms by hand, this project has grown considerably and provides now basic algorithms for various classification, regression and clustering tasks.

For further information on implemented algorithms and usage examples, please consult the project's website.

Implemented algorithms

Please consult the API for detailed and up-to-date information on the algorithms, e.g. the implemented hyper parameters.

Classification

  • random
  • logistic regression
  • perceptron
  • k-nearest neighbors
  • decision tree
  • multilayer neural network
  • naive Bayes
  • boosted decision tree
  • random forest
  • SVM with linear kernel
  • SVM with non-linear kernel (see here or here)
  • convolutional neural network
  • recurrent neural network

Regression

  • random
  • linear regression
  • decision tree
  • Bayes
  • k-nearest neighbors
  • neural network regression
  • SVM

Clustering

  • k-means
  • self-organizing map
  • hierarchical clustering
  • expectation-maximization
  • mean-shift

Misc

  • extension of linear models to polynomial dependencies via feature transformation

Installation

This app uses sbt as build tool.

Linux

Download Java JDK and then sbt:

echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
sudo apt-get update
sudo apt-get install sbt
# sbt new sbt/scala-seed.g8  # set up a dummy project

Windows

Download Java JDK and sbt.

Execution

From sbt:

cd mllab
sbt
# compile and run the app, use ~ for automatic updates and recompilation
[~]run # run the default random classifier
run --help  # get more information on options and commands
test  # compile and execute tests  
compile  # only compile the app
console  # start scala console for this project

Create test data in the data directory:

python3 bin/create_data.py --reg linear  # create dummy regression data
python3 bin/create_data.py --clf circles  # create dummy classification data

Then run MLlab on it, e.g. with sbt run --clf DecisionTree --data data

Development

Docker

Create the image yourself and publish it

docker build -t mllab .  # build the image
docker run -it mllab bash  # run it interactively
docker login
docker tag mllab andbot/mllab  # add optional tag with `:tag`
docker push andbot/mllab

or download the latest version from docker hub:

docker pull andbot/mllab  # pull it
docker run andbot/mllab  # pull & run it
docker run -it mllab bash  # open interactive shell - don't forget to run `./init.sh` by hand!

Create executable jar

This will package everything in a fat jar, using sbt-assembly.

sbt assembly

Run the compiled jar e.g. with python like in examples/run_jar.py

Create API documentation

sbt doc

Style check and linter

This will check the code style, using scalastyle and Linter Compiler Plugin.

sbt
scalastyle  # style check
[compile, run]  # linter runs as compilation hook

Contribution

Everyone is welcome to contribute! I would especially appreciate support in

  • a web interface to try the library out: select datasets, algorithms and hyper parameters, run the analysis and do grid hyper parameter optimization

PRs and issues are always welcome.

Testing

Please write unit tests for your methods.

Contributors

This code is developed and maintained by me. List of contributors in alphabetical order:

  • Simon Spannagel
  • maybe you? 😆

Some useful links

Some remarks about scala

Some remarks about spark

About

An experimental platform-independent machine learning library

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published