Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWHL Scraper #34

Merged
merged 11 commits into from
Feb 22, 2024
Merged

PWHL Scraper #34

merged 11 commits into from
Feb 22, 2024

Conversation

benhowell71
Copy link
Collaborator

@benhowell71 benhowell71 commented Dec 3, 2023

added the following functions

  • pwhl_team_roster
  • pwhl_teams
  • pwhl_schedule

they have the game box scores sort of set up, but I'd rather wait for that to start to be populated before trying to write the scraper off the skeleton there.

Summary by CodeRabbit

  • New Features
    • Added functionality to retrieve and process play-by-play data for hockey games.
    • Introduced a new feature to fetch the PWHL schedule for specified seasons, including game details and winner calculations.
    • New function to obtain roster data for specific teams and seasons, catering to both regular and pre-season.
    • Implemented a feature to retrieve and display PWHL team information, including division and logo.
    • Added capability to fetch statistics for goalies or skaters based on team and season.

Copy link

vercel bot commented Dec 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
fastrhockey ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 22, 2024 3:20am

Copy link

sweep-ai bot commented Dec 3, 2023

Apply Sweep Rules to your PR?

  • Apply: All new business logic should have corresponding unit tests.
  • Apply: Refactor large functions to be more modular.
  • Apply: Add docstrings to all functions and file headers.

Copy link

coderabbitai bot commented Feb 15, 2024

Walkthrough

The recent updates introduce a comprehensive set of functions designed to enhance the accessibility and analysis of the Provincial Women's Hockey League (PWHL) data. These changes facilitate the retrieval and processing of play-by-play data, schedules, team rosters, team details, and statistical leaders. The modifications aim to streamline data extraction, enabling efficient analysis and visualization for users interested in exploring various aspects of PWHL games and player performances.

Changes

File Path Change Summary
R/pwhl_pbp.R Introduces play-by-play data retrieval and processing for hockey games.
R/pwhl_schedule.R, R/pwhl_team_roster.R, R/pwhl_teams.R Adds functions to fetch and process schedule, team rosters, and team details from the PWHL.
R/pwhl_stat_leaders.R Adds functionality to retrieve and process statistics for goalies or skaters in the PWHL.

"In the realm where data and ice collide,
🐇 A rabbit hops with code as its guide.
Through games and scores, it deftly slides,
Fetching stats with a swift, graceful stride.
🏒 To analyze and visualize, it provides,
In the world of hockey, where it proudly resides."

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 60c1df2 and d03f688.
Files ignored due to path filters (1)
  • _pkgdown.yml is excluded by: !**/*.yml
Files selected for processing (14)
  • DESCRIPTION (1 hunks)
  • NAMESPACE (1 hunks)
  • NEWS.md (1 hunks)
  • R/pwhl_pbp.R (1 hunks)
  • R/pwhl_schedule.R (1 hunks)
  • R/pwhl_team_roster.R (1 hunks)
  • R/pwhl_teams.R (1 hunks)
  • README.Rmd (2 hunks)
  • README.md (2 hunks)
  • cran-comments.md (1 hunks)
  • man/pwhl_schedule.Rd (1 hunks)
  • man/pwhl_team_roster.Rd (1 hunks)
  • man/pwhl_teams.Rd (1 hunks)
  • vignettes/introduction.Rmd (2 hunks)
Files skipped from review due to trivial changes (4)
  • NEWS.md
  • man/pwhl_schedule.Rd
  • man/pwhl_teams.Rd
  • vignettes/introduction.Rmd
Additional comments: 11
man/pwhl_team_roster.Rd (1)
  • 1-26: The documentation for pwhl_team_roster is clear and comprehensive, accurately describing the function's purpose, parameters, return value, and providing a useful example.
DESCRIPTION (1)
  • 2-11: The updates to the title, description, version number, and URL in the DESCRIPTION file accurately reflect the package's new focus on the Professional Women's Hockey League and its functionalities.
R/pwhl_teams.R (1)
  • 16-88: Verify the stability and long-term functionality of the hardcoded callback angular.callbacks._4 in the URL used to fetch PWHL teams data. Consider implementing a more dynamic approach if the callback parameter is subject to change.
NAMESPACE (1)
  • 52-54: The export statements for pwhl_schedule, pwhl_team_roster, and pwhl_teams are correctly added, making these functions available to package users.
R/pwhl_schedule.R (1)
  • 16-110: Verify the stability and long-term functionality of the hardcoded callback angular.callbacks._1 in the URL used to fetch PWHL schedule data. Consider implementing a more dynamic approach if the callback parameter is subject to change.
R/pwhl_team_roster.R (1)
  • 18-125: Verify the stability and long-term functionality of the hardcoded callback angular.callbacks._h in the URL used to fetch PWHL team roster data. Consider implementing a more dynamic approach if the callback parameter is subject to change.
README.md (1)
  • 103-131: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [20-128]

The updates to URLs, season kickoff dates, and social media handles in the README.md file accurately reflect the package's new focus on the Professional Women's Hockey League and its contributors.

README.Rmd (4)
  • 25-25: The URL has been updated to point to the new PWHL website. Ensure all references to the old PHF website are updated across the entire documentation and codebase to maintain consistency.
  • 27-27: The update to the season start date and broadcasting details aligns with the shift to the PWHL. Verify that all season-related information in the package reflects this new start date.
Verification successful

The verification process confirmed the presence of the new season start date, "January 1st," in README.md, README.Rmd, and vignettes/introduction.Rmd. No outdated references to "November 6th" were found, and searches for general terms related to the season did not produce additional results that would indicate missing updates. Based on this information, it appears that the season-related information in the package reflects the new start date as intended.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for any outdated season start dates.
rg 'November 6th'

Length of output: 19



Script:

#!/bin/bash
# Search for mentions of the new start date "January 1st" to confirm its presence.
rg 'January 1st'
# Additionally, search for general terms related to the season to catch any other potentially outdated or relevant information.
rg 'season start'
rg 'season 2024'

Length of output: 642

* 27-27: The objective to advance women's hockey analytics is clearly stated. This should be consistently reflected in the package's vignettes, examples, and use cases.
Verification successful

The script's output confirms that the package's vignettes and examples reflect a focus on advancing women's hockey analytics, as stated in the README.Rmd file. This alignment with the objective to push women's hockey analytics forward is evident in the documentation, verifying the accuracy of the review comment.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check for vignettes and examples that reflect the focus on women's hockey analytics.
fd '.Rmd' --exec grep -i 'women\'s hockey analytics'

Length of output: 93



Script:

#!/bin/bash
# Corrected script to check for vignettes and examples that reflect the focus on women's hockey analytics.
fd '.Rmd' --exec grep -i "women's hockey analytics"

Length of output: 909

* 89-103: The update in author and contributor information to include Twitter handles enhances the visibility and accessibility of contributors. Ensure that all contributors have consented to this change and that their handles are correctly linked.

R/pwhl_pbp.R Outdated Show resolved Hide resolved
README.Rmd Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 8

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d03f688 and 1946276.
Files selected for processing (1)
  • R/pwhl_pbp.R (1 hunks)

R/pwhl_pbp.R Outdated Show resolved Hide resolved
R/pwhl_pbp.R Outdated Show resolved Hide resolved
R/pwhl_pbp.R Outdated Show resolved Hide resolved
R/pwhl_pbp.R Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 7

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1946276 and d1fec80.
Files selected for processing (1)
  • R/pwhl_stat_leaders.R (1 hunks)

R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
R/pwhl_stat_leaders.R Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 11

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d1fec80 and 2ae0779.
Files selected for processing (3)
  • R/pwhl_pbp.R (1 hunks)
  • R/pwhl_standings.R (1 hunks)
  • R/pwhl_stat_leaders.R (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • R/pwhl_stat_leaders.R
Additional comments: 2
R/pwhl_standings.R (1)
  • 107-107: The use of dplyr::left_join to merge reg_standings and standings is appropriate. Ensure that the joining keys (team, team_code) uniquely identify rows in both data frames to avoid unintended duplication.
R/pwhl_pbp.R (1)
  • 437-440: The use of dplyr::select and dplyr::mutate to clean and augment game_df is good. However, ensure that the game_id parameter is properly passed and used within the function, as it's not explicitly defined within the function scope.

R/pwhl_standings.R Show resolved Hide resolved
R/pwhl_standings.R Show resolved Hide resolved
R/pwhl_standings.R Show resolved Hide resolved
R/pwhl_standings.R Show resolved Hide resolved
R/pwhl_standings.R Show resolved Hide resolved
R/pwhl_standings.R Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2ae0779 and 7af48bc.
Files selected for processing (2)
  • R/pwhl_pbp.R (1 hunks)
  • R/pwhl_stat_leaders.R (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • R/pwhl_stat_leaders.R
Additional comments: 1
R/pwhl_pbp.R (1)
  • 24-25: The approach to manipulate the JSON response using gsub to handle JSONP format is acknowledged. Given the specific format of the response, this might be the most straightforward method without introducing additional dependencies. However, remain cautious about potential future changes in the API's response format that could break this logic.

R/pwhl_pbp.R Outdated Show resolved Hide resolved
R/pwhl_pbp.R Show resolved Hide resolved
R/pwhl_pbp.R Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7af48bc and acedc65.
Files selected for processing (2)
  • R/pwhl_pbp.R (1 hunks)
  • R/pwhl_stat_leaders.R (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • R/pwhl_stat_leaders.R
Additional comments: 3
R/pwhl_pbp.R (3)
  • 19-19: The URL construction using glue::glue is clear and concise, ensuring dynamic insertion of the game_id. This approach is efficient for constructing URLs with variable components.
  • 21-23: The use of httr::RETRY for making the HTTP GET request is a good practice for resilience, especially in network operations that might fail due to transient issues. This ensures that the function will retry the request a few times before giving up, which can improve the reliability of the data fetching process.
  • 433-436: The data frame manipulation using dplyr to select columns, mutate the data frame to include the game_id, and relocate the game_id column is well-implemented. This showcases a good understanding of dplyr functions and their use in data manipulation.

R/pwhl_pbp.R Show resolved Hide resolved
R/pwhl_pbp.R Show resolved Hide resolved
R/pwhl_pbp.R Show resolved Hide resolved
@benhowell71 benhowell71 merged commit b4cc412 into main Feb 22, 2024
2 checks passed
@benhowell71 benhowell71 deleted the pwhl_scraper branch February 22, 2024 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants