Hello everyone! This is my submission for the ReDI Data Analysis course final project.
All data files were either scraped or obtained from the following websites:
Sports reference - Player data, and team standing from 1992/1993 - 2023/2024 season. - https://fbref.com/en/
Player of the Season reference from 1992/1993 - 2023/2024 - https://en.wikipedia.org/wiki/Premier_League_Player_of_the_Season
For a better understanding of the Columns and what they mean please see below the explanation of each term used in the project.
Player
Pos - Position (DF: Defender, FW: Forward, MF: Midfielder, & GK: Goalkeeper)
Squad - Clubs
Age
MP - Number of matches featured the player featured
Starts - Number of games the player started
Min - Total number of minutes a player played in a full season.
90s - Number of minutes played divided by 90
Gls - Total number of goals scored by a player in a full season
Ast - Total number of assists leading to a goal a player had in a full season
G+A - Combined goals and assists
CrdY - Total number of yellow cards
CrdR - Total number of red cards
Year - (Please note that the year in which a season starts is what is captured in the columns for example 23/24 season is recognized as 2023 in the Year column)
Season
xG - total number of expected goals in a season include penalty kicks, but do not include penalty shootouts.All "expected" metrics were provided by Opta.
npxG - Non-penalty Expected goals in a season
xAG - Expected assisted goals in a season
npxG+xAG Non-Penalty Expected Goals plus Assisted Goals in a season. xG totals include penalty kicks but do not include penalty shootouts (unless otherwise noted).
PrgC - Progressive carries. Carries that move the ball toward the opponent's goal line at least 10 yards from its furthest point in the last six passes, or any carry into the penalty area. Excludes carries which end in the defending 50% of the pitch.
PrgP -- Progressive Passes. Progressive Passes completed passes that move the ball towards the opponent's goal line at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch.
PrgR -- Progressive Passes Received. Progressive Passes Received are completed passes that move the ball toward the opponent's goal line at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch.
poy_winner - The winner of the Player of the Season award. 1 - for the winner and 0 for other players.
Predicted_Rk - Projected/predicted ranking of the players at the end of the season purely based on parameters specified in the predictors' list.
predictor variables include all numeric variables, which can be adjusted as you desire
[
'Age', 'MP', 'Starts', 'Min', '90s', 'Gls',
'Ast', 'G+A', 'xG', 'npxG', 'xAG',
'npxG+xAG', 'PrgC', 'PrgP', 'PrgR', 'CrdY', 'CrdR'
]
Player rank based on variables in the predictors' list for 2023/2024 season
Player rank based on variables in the predictors' list for 2022/2023 season
I did some simple EDA to give me an overview of what my Data looks like.
I explored the following:
1. Players with the most progressive carries, receives and passes in 2021/2022, 2022/2023 & 2023/2024 season
2. Most penalised clubs (Red and Yellow cards) each season
3. Determining the club with the highest number of players below 20 years old from 2023/2023 & 2023/2024 4. Number of goals scored by the winner of the player of the season award
Special thanks and acknowledgment go to ReDI School, Munich especially the tutors and volunteers for their selfless work and contribution.
This project is purely for personal learning. There is no intention of claiming ownership of the data source. All data sources have been duly referenced as acknowledged.
Please feel free to interact with me and let me know areas where I can improve on. The project is still a work in progress and I would be happy to expand the scope as often as possible