Skip to content

Latest commit

 

History

History
62 lines (45 loc) · 4.07 KB

README.md

File metadata and controls

62 lines (45 loc) · 4.07 KB

Machine Learning Projects

A collection of R and Python ML projects performed on real-world and practical datasets.

Table of Contents

What is this?

This is a collection of R and Python of various ML projects. Widely used and practical algorithms are developed with minimum dependency. These projects are conducted as assignments for the "Applied Machine Learning" Course taught at UIUC by Dr. Forsyth. The projects include a wide variety major ML algorithms conducted on real-world and practical datasets. This README only shows some examples of these projects. For more details and source codes check the page for each project individually.

Naive Bayesian Analysis & Random Forrest

UC Irvine data on diabetes patient are classified using a manually developed naive Bayes classifier as well as klaR and SVMLight packages for R. The case of patients with missing data are dealt using various methods and results are compared.

Classifier Accuracy Note
Naive Bayes 0.752 Using dnorm () Function
Naive Bayes 0.748 Replacing 0 values with "NA"
Naive Bayes 0.778 Using klaR Library
SVM 0.810 Using SVMLight Library

MNIST images are classified using Random Forrest method by h2o R Package.

# of Trees Depth = 4 Depth = 8 Depth = 16
10 0.85 0.93 0.96
20 0.87 0.94 0.96
30 0.87 0.94 0.97

Source code and details

Support Vector Machine

A support vector machine is developed and trained on "UC Irvine data on adult income" using stochastic gradient descent with various regularization constants. 0.001 is chosen as the best regularization constant w/ an accuracy of 87% on test data.

Principal Components Analysis

A PCA was performed on CIFAR-10 dataset. The first 20 principal components were computed and used for representing the images of each category. Each class is tried to be represented by PC of other classes and therefore, the similarities of all 10 classes are measured.

.

Linear and Logistic Regression

UCI dataset on features of music and latitude and longitude of their origins is analyzed. A linear regression is developed for predicting the latitudes against features and is improved using Box-Cox transformation. Also, regularized regressions are developed using glmnet Package. Logistic regression are also developed on another UCI dataset to predict a binary logistic classification.

Expectation Maximization

Images are segmented using clustering. For this image pixels are clustered and then images are segmented by mapping each pixel to the cluster center with the highest value of the posterior probability for that pixel.

Authors