2018 NCAA March Madness Men's Basketball Predictions

By Brice Walker

The Bracket (Go Bulldogs!)

Outline

Getting started
Introduction
Feature extraction and engineering
Classification analysis

Getting Started

I have provided a few handy scripts to allow quickly running this on your own. This project was built/tested on python 3.6 and it is recommended to use 3.6+. It is also recommended that you use Anaconda. This project requires Jupyter Notebook.

Simply run the following codes from a terminal in the repo directory:

pip install requirements.txt
python process_data.py
python train_model.py

Note: You may need to install XGBoost from source.

You may also need to run pip uninstall tensorflow to uninstall tensorflow and then run pip install tensorflow-gpu in order to take advantage of TensorFlow's GPU acceleration.

I have also provided a jupyter notebook that walks you through the iterative process.

Note: This project has been optimized for deployment on Clusterone.

To run this project on Clusterone, use clusterone.py

Introduction

This is a classification project completed for the 2018 March Madness Kaggle Competition. In this project, I have extracted 18 season-based, and 28 tournament-based team-level characteristics from several datasets using data from 1994-2017. I used datasets provided by kaggle, as well as data scraped from sports-reference.com. I then engineered several advanced measures and extracted Elo ratings. I used these characteristics to predict probabilities for each matchup in the 2018 March Madness Schedule. I then created a well calibrated soft voting classifier that used KNeighbors, Random Forest, Extra Trees, Logistic Regression, Gradient Boosting, and LightGBM classifiers to predict probabilities for each matchup. I also trained a Keras/TensorFlow based Neural Network. Finally, I developed predictions based on Microsoft's TrueSkill rating system and weighted them with the machine learning model predictions.

Feature Engineering and Extraction

This project attempts to predict outcomes of games based on the following team level characteristics for season games:

A modified Elo rating (where new entrants are initialized at a score of 1500, and there is no reversion to the mean between seasons)
Number of wins
Avg points per game scored
Avg points per game allowed
Avg # of 3 pointers per game
Avg turnovers per game
Avg Assists per game
Avg rebounds per game
Avg steals per game
Power 6 Conference
Reg Season championships
Strength of team's schedule
Championship appearances
Location of the game
A simple rating system

And the following team level characteristics for tournament performance:

Note: If a team plays in more than one tourney in a year than these values are averaged over all tourneys they played that year.

Tournament appearances
Conference tournament championships
Points scored for winning/losing team
A measure of possession
Offensive efficiency
Defensive efficiency
Net Rating (Offensive - Defensive efficiency)
Assist Ratio
Turnover Ratio
Shooting Percentage
Effective Field Goal Percentage adjusting for the fact that 3pt shots are more valuable
FTA Rating : How good a team is at drawing fouls.
Percentage of team offensive rebounds
Percentage of team defensive rebounds
Percentage of team total rebounds

Finally, predictions using these features are weighted and stacked with predictions made through team trueskill ratings.

Classification Analysis

Predictive binary classification statistical models explored in this project include:

Logistic Regression
K-Nearest Neighbors
Random Forests
Extra Trees
Support Vector Machines
Gradient Boosting
TensorFlow/Keras Neural Networks
Principal Component Analysis
Ensembling/Stacking and Weighting Models

Libraries

Machine learning libraries used in this project include:

Sci-Kit Learn
XGBoost
Light GBM
Keras
TensorFlow

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
.gitattributes		.gitattributes
LICENSE		LICENSE
Madness.ipynb		Madness.ipynb
NCAA_Predictions.csv		NCAA_Predictions.csv
README.md		README.md
clusterone.py		clusterone.py
gonzaga.png		gonzaga.png
output.png		output.png
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2018 NCAA March Madness Men's Basketball Predictions

By Brice Walker

The Bracket (Go Bulldogs!)

Outline

Getting Started

Introduction

Feature Engineering and Extraction

Classification Analysis

Libraries

About

Releases

Packages

Languages

License

bricewalker/NCAA-2018-Mens-March-Madness

Folders and files

Latest commit

History

Repository files navigation

2018 NCAA March Madness Men's Basketball Predictions

By Brice Walker

The Bracket (Go Bulldogs!)

Outline

Getting Started

Introduction

Feature Engineering and Extraction

Classification Analysis

Libraries

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages