Skip to content

Let's use Machine Learning to predict tennis matches and hopefully beat the bookies. 🎾

Notifications You must be signed in to change notification settings

CommanderPoe/tennis-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Betting on 21.5

MID-TERM project | A simple strategy to beat the bookies using ML.

Thanks to @albertoviciana for being the best teammate ever.

Big Disclaimer: We are not betting experts and we do not reccomend trying this at home with real money if you don't know what you are doing or if you dont feel comfortable with losing that money.



The dataset

The tennis dataset's shape we used is (28335,42), and it's the result of merging ten years of data. You can find it here: tennis-data.co.uk.

Columns Description:

  • |ATP| -> Tournament ID. 'int64'
  • |Location| -> Location of the tournament. 'object'
  • |Tournament| -> Name of the Tournament 'object'
  • |Date| -> Date of the tournament. 'datetime'
  • |Series| -> Type of tournament. 'object'
  • |Court| -> Surface of the match. 'object'
  • |Round| -> Round of the competition. 'object'
  • |Best of| -> Max amount of sets that can be played in a match. 'float64'
  • |Winner| & |Loser| -> Winner/Loser of the match. 'object'
  • |LRank| & |Loser| -> ATP Rank of the winner/loser. 'float64'
  • |WPts| & |LPts| -> ATP points in Rank of the winner/loser. 'float64'
  • |W1| - |W5|, |L1| - |L5| -> Games won or lost by each player on that match. 'float64'
  • |Comment| -> Comments on the state of the match. 'object'
  • |Wsets| & |Lsets| -> Sets won by the winner/loser. 'float64'
  • |B365W| & |B365W| -> Bookies odds (Bet 365). 'float64'
  • |EXW| & |EXL| -> Bookies odds (Express). 'float64' 'object'
  • |SJW| & |SJL| -> Bookies odds (SJ). 'float64'
  • |PSW| & |PSL| -> Bookies odds (Pinnacle). 'float64'
  • |LBW| & |LBL| -> Bookies odds (Liberty). 'float64'
  • |MaxW| & |MaxL| -> Max odds offered taken oddsport. 'float64'
  • |MinW| & |MinL| -> Mix odds offered taken oddsport. ??
  • |AvgW| & |AvgL| -> Avg odds offered. 'float64'

raw_data_example

Introduction

For the midterm project at Ironhack, we decided to find a way to get rich quickly (jokes on us 😄). We focused on a simple analysis of the betting market and tried to build a strategy to beat the bookies using Machine Learning. We picked every major tournament ie; ATPs, GrandSlams, Masters and MasterCups played from 2010-2021 and skipped 'small' tournaments like Challengers or ITFs. Compared to other sports like football, where that number of matches is easily achievable by the first and second division of any single European country, we had to squeeze the volume to reduce the variance.

We only took into account ATP ranked players.

Goal

  • Get a positive Return on Investment (ROI).
  • Big volume of matches.
  • Find our niche (OVER/UNDER 21.5).


Rules of tennis

Check here the rules of tennis.

How does tennis betting work in a nutshell?

Let's say you bet 10€ on Nadal with an odd of 1.5:

  1. If he wins, you go back home with your 10€, plus 5€ of benefits.
  2. If he loses you go back home with 0€
  3. If he wins, as you bet 10€ and won 15€, your ROI (return on investment) is 50% on that specific bet.
  4. If he looses, your ROI in -100%

If you want to dig deeper on how tennis betting work: Click here.

For our specific market target (OVER/UNDER 21.5). Click here.

Understanding the impact of ROI,YEILD & BEP.

ROI or Return on Investment is a measure of the efficiency of an investment – or how much money you can expect to make relative to the amount of money you risk.

PROFIT = (Wager Return - Amount Wagered)

ROI= PROFIT / AMOUNT_WAGERED

The ROI in the above example would be equal to 5€ or 50% if our pick wins.

Yield is a percentage calculation of the betting efficiency, depending on the selected bets and odds for the match or bet slip.

#= Number of bets

YEILD = (Profit - #) / #

Yield and ROI have a very similar formula. The difference is that ROI gives you your profit/loss ratio related to your initial investment, while yield gives you your average win on the turnover, for each of your individual bets.

The BEP or Break Even Point is that magical number where if we win this percentage of the time we’ll at the very least not lose money, and ideally go beyond that to make a nice return on their investment.

BEP = 1 / odds

Using 1.80 odds (more or less the avg offered by the bookies for the over/under 21.5 in tennis), we'd need to win 55.55% of the time. If we rounded up, we could say that anything above 56% is net profit in the long run.

How to use the model?

Since there's no GUI or fancy web page to try our findings, if you wanted to use our model to predict tennis results, you would have to interact with the script thru Jupyter Notebook or using the Code Editor of your choice.

These were the two highest results we got from running several different Machine Learning models on over 20k matches.

Logistic Regression LR

Gradient Boost LR

Keep in mind; there's still a lot of room for improvement. Nonetheless, these numbers look like they are decently high for our purpose, staying above the BEP = 55%.

No matter how bad we do in the first couple of matches because as long as we can keep this win rate of at least 60% we are making sure to book a nice yield of 5% on every bet regardless the outcome of single bats.

When it comes to this kind of situations you want one of these two things, or both if possible...

  1. Bet as much as posible and squeeze the volume to reduce the variance in the short term.
  2. Bet big amounts of money on single bets since we are booking a nice +5% yeild on every bet.

Remember that bankroll management is an essential rule for anyone eager to play with markets in general. So, as a rule of thumbs, I would never recommend betting more than 3% of your total bankroll in a single bet.

To predict future matches using our model, you would need the following information about the tournaments and the players:

final_table

series

court

surface

wrank -> Rank of player 1

lrank -> Rank of player 2

wpts -> Ranking points of player 1

lpts -> Ranking points of player 2

total_avg -> Average games played by the 2 players in the past regardless the outcome.

Keep in mind that to keep this table updated, you have to keep adding the total games played on that specific match you will predict + computing the totg_avg. Being bit familiar with the EDA and the code its highly recommended at this stage.

All this data can be easily collected before any match and inputted into the table easily.

Conclusions and Key learnings...

A word of warning: Bookies are very tricky and have the brightest minds working for them; otherwise, they would not have stayed in business for this long. Why am I saying this?

Assume you check the odds for our target market (OVER/UNDER 21.5). One could think that it would always be available even if the bookies adjusted the odds regarding the levels of each player, but it's not always the case. They don't always offer OVER/UNDER 21.5 market because they move up and down the number of games depending on the players' statistics in question.

Example, these are real odd from matches happening this same week in Barcelona.

1

2

3

4

As you can see here, the number of games sometimes goes to 22.5, some others as low as 17.5 when there's a big gap in the players' skills. This could be easily solvable in our model by just changing the function of the target variable from 21.5 to the desired number. Then training the model again would be compulsory.

We can say that the avg odd is always around 1.8 (Im telling this from empiric experience, not from a mathematical point of view) and the breakeven point at 55% so we still have some margin to play with that.

Bonus

My partner and I ran some numbers, using odds from real matches this week (ten of them). We picked six randomly and gave it as won and four losses, so we keep our 60% precision or win rate our model predicted. The results were astonishing. After 100 matches, we had an excellent 10.10% YIELD, and the ROI was +100%.

excel_record

Understanding the impact of the statistics of your sports bets means including a projection on the long run, including a specific and strict bankroll, a standard bet size, and disciplined betting behavior. When you combine those tactics with an ability to shop for the best lines in the business, you've got a formula for consistently winning while placing wagers on the outcome of sporting events.

About

Let's use Machine Learning to predict tennis matches and hopefully beat the bookies. 🎾

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published