The objective of this project is to explain the high salaries of the NBA (National basketball Association) players. The project is to provide analytics for professional basketball team owners using data from http://www.basketballreference.com. It includes determining the factors resulting in high salaries to the players, how much a team should pay for a new player given the player's record? . The project as such is split into two phase
Analysis 1:
In this phase, we scrap essential data from www.basketballreference.com, perform ETL and explore the data. In addition, this phase also provides answers for simple statistical questions.
Analysis 2:
This phase answers most of the complex analytical question through various techniques using R. The following are some of the statistical methods used.
-
Clustering - Perform K-Means clustering and interpret the clusters
-
Linear Regression Models - Identifying the dependent variables/predictors, build the regression model and interpret the results
-
Panel Data - Identify time varying and time invariant factors, fixed effects model, random effects model.
The directories are structures as follows
Analysis-1-Dataset/ - This directory contains the dataset used for performing the Analysis-1
Analysis-2-Dataset/ - This directory contains the dataset used for performing the Analysis-2
Python Codes/ - It contains all the python codes used to scrap the data from basketballreference.com
R Codes/ - R codes that are used to answer statistical questions in Analysis 2
Result/ - The result data set of Analysis-1
Schema/ - This directory contains the complete schema structure desinged for this project. In addition, it also contain necessary SQL to build the schema
SQL/ - This directory contains all essential SQL queries that are used for this analysis. Note.SQLite3 has been used as the database for this project
Analysis-1.pdf - A complete documentation containing answers to all questions in Analysis-1
Analysis-2.pdf - A complete documentation containing answers to all questions in Analysis-2