In this project we develop and implement different regression methods on datasets with a large number of both numerical and categorical features. We will explore linear regression as well as ensemble methods such as Gradient Boosting and Random Forest. We then evaluate each methods advantages along with their limitations such as overfitting.
The code is divided to four categories:
- The Preprocessing.ipynb Jupyter Notebook contains the code to preprocess the raw data.
- The Regression.ipynb Jupyter Notebook contains the linear regression methods
- The Random_Forest.ipynb Jupyter Notebook contains the random forest regression model.
- The Gradient Boosting file contains MATLAB files to implement gradient boosting method.
More details of the project can be found on Description.pdf pdf file.
This project have done by Gabrielle Belok, Artin Spiridonoff and Me, Supervised by Professor Prakash Ishwar.