Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H2H stats Scheduled Games #1

Open
xChr11s opened this issue Sep 3, 2020 · 7 comments
Open

H2H stats Scheduled Games #1

xChr11s opened this issue Sep 3, 2020 · 7 comments

Comments

@xChr11s
Copy link

xChr11s commented Sep 3, 2020

Hey 👍
Your scraper looks very interesting so far.
I tested a bit around but couldn't achieve the results I wanted.
So my goal was to get the data from Upcoming Matches.
It has to colelct the last 15 Matches every Team has played and the last Matches against each other.
This data is found under the tab H2H.
The programm has to click four times on the "Show more matches" tab and scrape the Teams and results.

I acheived to click on "Schedueld" with the following code:

#check upcoming
button_tomorrow = driver.find_element_by_xpath('/html/body/div[4]/div[1]/div/div[1]/div[2]/div[4]/div[2]/div[1]/div[1]/div[6]/div')
button_tomorrow.click()
time.sleep(5)

Would this be possible to code?

Thanks in advance,

Kind regards
Chris

@msarnacki
Copy link
Owner

Hello! 👍

Thank you very much for your interest in my project.

I added to the repo a new little script (h2h.py) that scrapes h2h matches info like you wanted. The script can scrape as many last matches as you want.

def get_matches_info(matches, how_many):
    for i, match in enumerate(matches):  
        if i == how_many:
            break
        #get teams names and score and print it
        teams = match.find_all(class_ = 'name')
        team1 = teams[0].text
        team2 = teams[1].text
        score = match.find(class_ = 'score').text
        print(str(i) + '. ' + team1 + ' ' + score + ' ' + team2)

Function takes 2 arguments:

  • matches which is list of rows from table with matches. e.g. home_matches = overall.find(class_ = 'h2h_home').find_all(class_ = 'highlight') first finds h2h_home class which is table with last matches for home team, then finds all matches within that table.
  • how_many how many matches will be scraped. Maximum will be for example len(home_matches).

And here is the part of code that clicks "Show more matches" four times. Two times for home and two times for away table.
It is not necessary in the script because all matches info are in the source code even if they are hidden but yeah, this is surely possible. ☺️

for i in range(2):
    #gets list of elements with arrows (arrows are always with "Show more matches")
    show_more = driver.find_elements_by_class_name('arrow')
    #click first "more" and wait a second
    show_more[0].click()
    time.sleep(1)
    #click  second "more" and wait a second
    show_more[1].click()
    time.sleep(1)

To every key part of code I attached my comments. If you have more questions, please feel free to ask me. 😃

If you have some more ideas how to do this you can fork my repo and do a pull request. 😃

Kind regards
Maciej

@xChr11s
Copy link
Author

xChr11s commented Sep 4, 2020

Hey,
wow thanks for your fast answer and Code !

I didn't know that the matches are stil in the source code even if "Show more matches" is not clicked.
It threw an error that the Bet365 Ad was in front, so I just deleted the Click 4 Times and it worked fine :)

selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <span class="arrow"></span> is not clickable at point (507, 881). Other element would receive the click: <a class="boxOverContent__bannerLink" href="/promobox/11313/?sport=1" data-mobile-url="/promobox/11313/?sport=1&amp;mobile=1" target="_blank"></a>
  (Session info: chrome=85.0.4183.83)

I got the following data from your match:
Home team last matches
0. West Brom 2 : 2 QPR

  1. Huddersfield 2 : 1 West Brom
  2. West Brom 0 : 0 Fulham
  3. Blackburn 1 : 1 West Brom
  4. West Brom 2 : 0 Derby
  5. West Brom 4 : 2 Hull
  6. Sheffield Wed 0 : 3 West Brom
  7. Brentford 1 : 0 West Brom
  8. West Brom 0 : 0 Birmingham
  9. Swansea 0 : 0 West Brom
  10. West Brom 2 : 3 Newcastle
  11. West Brom 0 : 1 Wigan
  12. West Brom 2 : 0 Preston
  13. Bristol City 0 : 3 West Brom
  14. West Brom 2 : 2 Nottingham
    Away team last matches
  15. Leicester 0 : 0 Sheffield Wed
  16. Birmingham 0 : 2 Leicester
  17. Leicester 0 : 2 Manchester Utd
  18. Tottenham 3 : 0 Leicester
  19. Leicester 2 : 0 Sheffield Utd
  20. Bournemouth 4 : 1 Leicester
  21. Arsenal 1 : 1 Leicester
  22. Leicester 3 : 0 Crystal Palace
  23. Everton 2 : 1 Leicester
  24. Leicester 0 : 1 Chelsea
  25. Leicester 0 : 0 Brighton
  26. Watford 1 : 1 Leicester
  27. Leicester 4 : 0 Aston Villa
  28. Leicester 1 : 0 Birmingham
  29. Norwich 1 : 0 Leicester
    VS each other last matches
  30. West Brom 1 : 4 Leicester
  31. Leicester 1 : 1 West Brom
  32. West Brom 1 : 2(1 : 1) Leicester
  33. West Brom 0 : 1 Leicester
  34. Leicester 1 : 2 West Brom

I will try to export these into an excel file. If I found a way I can do a fork :)
Maybe I need some help there but I will try alone first :)

Thanks !

@msarnacki
Copy link
Owner

Hey,

yeah, sometimes ads or cookie notifications get in front of buttons. It depends on window size, sometimes it is good to maximize window or scroll page to the element you want to click.

With matches being in code even when they are hidden, I think it is not a common thing.
In results from a particular season, for example here matches that are hidden are not in the source code and first you need to click "show more".

☺️

@debugleader
Copy link

Hey, awesome project btw @msarnacki
Also, @xChr11s, do you wanna start a new project with python?
If you need my help, I would be glad to contribute and try to improve it :)

@xChr11s
Copy link
Author

xChr11s commented Sep 9, 2020

Hey,

yeah, sometimes ads or cookie notifications get in front of buttons. It depends on window size, sometimes it is good to maximize window or scroll page to the element you want to click.

With matches being in code even when they are hidden, I think it is not a common thing.
In results from a particular season, for example here matches that are hidden are not in the source code and first you need to click "show more".

☺️

Hey,
yes on the h2h page it is good that you dont need to click on show more matches.
I stil couldn't figure out how to save the match data in an excel file, my python knowledge isn't good enough I guess.
I changed the Code a bit so that it is looking like this:
Gyazo

And my goal is to get the data saved like this:
Gyazo

But I'm only getting errors x)
Do you have an Idea how to sve it like this?
I will try it further but don't think I can get it working.

Hey, awesome project btw @msarnacki
Also, @xChr11s, do you wanna start a new project with python?
If you need my help, I would be glad to contribute and try to improve it :)

I dont think that my Python skills are good enough x)
But Thanks for your help :)

@debugleader
Copy link

Hey it's fine, we always start somewhere :)
Tell me if you're down to learn more @xChr11s

@xChr11s
Copy link
Author

xChr11s commented Sep 9, 2020

I guess I'm not good enought to get this done, tried for several hours now and I'm completely done now ...
I just want the output from the scrape in an simple excel sheet ... this can't be that hard ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants