Can you predict which team will win FIFA World Cup Qatar 2022?
(Source: https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022)
FIFA World Cup [1] is the most famous football competition in the world and it will be held from Nov 20, 2022 to Dec 18, 2022 [2]. In this project, we want to predict results of the quarterfinals as well as the exact goals scored by each team using Machine Learning algorithms. Aiming at those goals, we need to leverage massive amounts of data from historical tournaments of each team and ratings of individual players, which is well-suited for machine learning approaches. Specifically, predicting results can be modeled as a classification problem and predicting goals scored by each team can be modeled as a regression problem. To evaluate our method, we will use data from FIFA World Cup 2018 as validation and finally test prediction accuracy on FIFA World Cup 2022 quarterfinals.
-
Prepare the prerequisites
pip install -r requirements.txt
-
The directory structure of this project is shown as below:
. ├── README.md ├── data │ ├── V1 │ ├── V2 │ └── raw_data ├── data_preparation.py ├── models ├── models.py ├── pic ├── requirements.txt ├── train.py └── utils.py
-
This project has collected and preprocessed the soccer datasets from Kaggle, e.g., FiFA World Cup, FIFA Rankings, Player Score, International Matches . It is recommended to put them under directory
data/raw_data
. -
To preprocess the data:
python data_preparation.py
Arguments are shown as below:
Argument Default Description --data_dir './data/V2' Data directory to save. --raw_data_dir './data/raw_data' Raw data directory. --cur_year 2018 Test data year.
P.S. In our V1 experiments, we observed the inbalance issues of feature importance. To alleviate this problem, we bring more relevant features in to our V2 dataset and achieved better results.
-
Train and evaluate ML models on WorldCup dataset. For example,
python train.py --model lgb
Some important arguments (listed below, more could be found in
train.py
):Argument Default Description --model lgb ML models for training and evaluation: e.g., KNN --data-dir 'data/V2' Directory path for the data set. --model-dir 'models' Directory for saving trained model files --cur-year 2018 Test set year.
-
Results for regression-based model
Model Scale Test MSE on V1 Test R2 on V1 Test Acc on V1 Test MSE on V2 Test R2 on V2 Test Acc on V2 PR Standard 2.1654 -1.1983 0.5 1.4530 -1.2368 0.5 RF None 2.2198 -1.2657 0.25 1.4877 -1.2650 0.25 GB None 2.1726 -1.2065 0.25 1.9732 -0.9924 0.5 KRR Standard 2.1813 -1.2546 0.5 1.9313 -1.0260 0.75 LGB MinMax 2.0754 -1.1583 0.5 1.3259 -0.1720 0.5 -
Results for classification-based model
Model Scale Test Acc on V1 Test Win Acc on V1 Test Acc on V2 Test Win Acc on V2 KNN None 0.125 0.75 0.125 0.75 -
Feature importance analysis
Enjoy the World Cup 2022!
[1] Nam-Su Kim and Laurence Chalip. Why travel to the fifa world cup? effects of motives, background, interest, and constraints. Tourism management, 25(6):695–707, 2004.
[2] Wikipedia. FIFA World Cup — Wikipedia, the free encyclopedia. http://en.wikipedia. org/w/index.php?title=FIFA%20World%20Cup&oldid=1111687599, 2022. [Online; ac- cessed 30-September-2022].