If you want an overview, you should read our poster. The paper is more technical about algorithms and implementation details. The poster is slightly outdated so view the paper for the most current results.
We use scikit-learn for training and testing everything except neural networks. Located in ./python/
are python scripts which train each model. The models are listed below:
- Multiclass Logistic Regression
- Support Vector Machine (one-vs-all)
- Random Forest
Neural networks are implemented in MATLAB with the Neural Network Toolbox. These files are found in the ./matlab/
folder. Each file is explained below:
./matlab/nn_main.m
- Iteratively trains several neural networks by varying several hyperparameters, training set size, and train/test ratios
./matlab/nn_single_iter.m
- Trains a single neural network using the specified parameters
./matlab/make_rse_plots.m
and ./matlab/make_accuracy_plots.m
- Take input files containing results of the neural network tuning stage, generate, and format plots.
To train the network on the GPU, you must have the Mathwork's Parallel Computing toolbox.
We use the Arrhythmia Data Set which is part of the UCI Machine Learning Repository. Our imputed dataset is located at
./data/data_clean_imputed.csv
and contains the clean data. As mentioned in the paper, ./data/pca.csv
contains the principal components of the clean dataset. Matlab code to impute the original dataset is found in ./matlab/impute.m
and is compatible with Octave.