TODO cluster by tasks (instead of algorithms)
This repository contains jupyter-notebooks to accompany the tutorials for our data science lectures. The following topics are covered (each within a separate folder).
- Dataset Visualization (Boston Housing minus the linear regression; also other datasets like Flower, MNIST-digits, 20newsgroups) working/visualizing one dataset (incl. Matplotlib; .describe attribute; box-plot, min-max-normilization; boston housing; linear reg c/o dsP)
- Clustering
- Association Rule Learning (dataset yet to be determined; preferably from scikit learn)
- Regression (linear regression from Boston Housing and Car Prices)
- Bayes Learning (for spam filtering/text classification)
- Classification with Decision Trees (start with small 5-line dataset)
- Neural Networks (use keras.io to build a neural network for MNIST-digit classification; here's a tutorial); OPT use gensim (for word2vec; pick dataset from tensorflow); then auto-encoder for representatino learning
- OPTIONAL MapReduce
The main packages are pandas & scikit-learn
See our python-tutorials on instructions how to set this up on your machine.
- Python (>= 2.7 or >= 3.3)
- NumPy (>= 1.6.1)
- SciPy (>= 0.9)
- scikit-learn (>=0.18.1); documentation, also as pdf with Quick Start and Tutorials
- Matplotlib >= 2.1.1
- Pandas; [documentation] also as pdf
-
- Scikit-learn-overview.ipynb
- Web Mining Project .ipynb
-
- 1-IRIS.ipynb
- 2-Boston_house_dataset.ipynb
- 3-MNIST.ipynb
- 4-UCI_CAR.ipynb
- 5-20newsgroups.ipynb
- 6-KDD_cup_2000_data_set.ipynb
-
- Apriori_asaini.ipynb (MBE_dataset)
- Apriori.ipynb (Boston house)
- Apriori_server.ipynb (Mango_dataset)
- Assignment_Association_rule_learning.ipynb
- Tutorial_association_rule_learning_shopping_basket.ipynb (KDDcup 2000)
-
- Assignment_Linear_Regression.ipynb
- Assignment_Logistic_regression.ipynb (UCI_car)
- Boston_house_Linear_Regression.ipynb (Boston house)
- Linear_regression_diabetes_dataset.ipynb
- Linear-Regression.ipynb (Boston house)
- Logistic_regression.ipynb (IRIS)
- Small_scale_linear_regression.ipynb (KDDcup)
- Supervised_Learning_with_Linear_Models.ipynb (Boston house)
-
- Bayes-Learning.ipynb (IRIS) (20 news group)
-
7-Decision-Trees.ipynb (UCI_car)
-
- keras-mnist.ipynb (MNIST)
- Simple-NN.ipynb (make_moons)
- Stacked-Denoising-Autoencoders.ipynb
- INFO Software Comparison
- keras.io (high-level, running on top of TensorFlow (default) or Theano) c/o Francois Chollet (written in Python)
- Theano c/o Universite de Montreal (written in Python; tightly integrated with NumPy)
- TensorFlow c/o Google Brain (written in Python/C++)
-
- NLP-with-NLTK-Short-Intro.ipynb
- Python for Science and Machine Learning: Reading List
- cheat sheets for machine learning
- Cheat Sheet of Machine Learning and Python (and Math) Cheat Sheets
- Essential Cheat Sheets for Machine Learning and Deep Learning Engineers
- large repository of Data Science Python Notebooks
- TODO compare to Christin Seifert's https://github.com/chseifert/tutorials/tree/master/data-science
(should be listed at the module)