-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PWHL Scraper #34
PWHL Scraper #34
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Apply Sweep Rules to your PR?
|
WalkthroughThe recent updates introduce a comprehensive set of functions designed to enhance the accessibility and analysis of the Provincial Women's Hockey League (PWHL) data. These changes facilitate the retrieval and processing of play-by-play data, schedules, team rosters, team details, and statistical leaders. The modifications aim to streamline data extraction, enabling efficient analysis and visualization for users interested in exploring various aspects of PWHL games and player performances. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to path filters (1)
_pkgdown.yml
is excluded by:!**/*.yml
Files selected for processing (14)
- DESCRIPTION (1 hunks)
- NAMESPACE (1 hunks)
- NEWS.md (1 hunks)
- R/pwhl_pbp.R (1 hunks)
- R/pwhl_schedule.R (1 hunks)
- R/pwhl_team_roster.R (1 hunks)
- R/pwhl_teams.R (1 hunks)
- README.Rmd (2 hunks)
- README.md (2 hunks)
- cran-comments.md (1 hunks)
- man/pwhl_schedule.Rd (1 hunks)
- man/pwhl_team_roster.Rd (1 hunks)
- man/pwhl_teams.Rd (1 hunks)
- vignettes/introduction.Rmd (2 hunks)
Files skipped from review due to trivial changes (4)
- NEWS.md
- man/pwhl_schedule.Rd
- man/pwhl_teams.Rd
- vignettes/introduction.Rmd
Additional comments: 11
man/pwhl_team_roster.Rd (1)
- 1-26: The documentation for
pwhl_team_roster
is clear and comprehensive, accurately describing the function's purpose, parameters, return value, and providing a useful example.DESCRIPTION (1)
- 2-11: The updates to the title, description, version number, and URL in the DESCRIPTION file accurately reflect the package's new focus on the Professional Women's Hockey League and its functionalities.
R/pwhl_teams.R (1)
- 16-88: Verify the stability and long-term functionality of the hardcoded callback
angular.callbacks._4
in the URL used to fetch PWHL teams data. Consider implementing a more dynamic approach if the callback parameter is subject to change.NAMESPACE (1)
- 52-54: The export statements for
pwhl_schedule
,pwhl_team_roster
, andpwhl_teams
are correctly added, making these functions available to package users.R/pwhl_schedule.R (1)
- 16-110: Verify the stability and long-term functionality of the hardcoded callback
angular.callbacks._1
in the URL used to fetch PWHL schedule data. Consider implementing a more dynamic approach if the callback parameter is subject to change.R/pwhl_team_roster.R (1)
- 18-125: Verify the stability and long-term functionality of the hardcoded callback
angular.callbacks._h
in the URL used to fetch PWHL team roster data. Consider implementing a more dynamic approach if the callback parameter is subject to change.README.md (1)
- 103-131: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [20-128]
The updates to URLs, season kickoff dates, and social media handles in the README.md file accurately reflect the package's new focus on the Professional Women's Hockey League and its contributors.
README.Rmd (4)
- 25-25: The URL has been updated to point to the new PWHL website. Ensure all references to the old PHF website are updated across the entire documentation and codebase to maintain consistency.
- 27-27: The update to the season start date and broadcasting details aligns with the shift to the PWHL. Verify that all season-related information in the package reflects this new start date.
Verification successful
The verification process confirmed the presence of the new season start date, "January 1st," in
README.md
,README.Rmd
, andvignettes/introduction.Rmd
. No outdated references to "November 6th" were found, and searches for general terms related to the season did not produce additional results that would indicate missing updates. Based on this information, it appears that the season-related information in the package reflects the new start date as intended.* 27-27: The objective to advance women's hockey analytics is clearly stated. This should be consistently reflected in the package's vignettes, examples, and use cases.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any outdated season start dates. rg 'November 6th'Length of output: 19
Script:
#!/bin/bash # Search for mentions of the new start date "January 1st" to confirm its presence. rg 'January 1st' # Additionally, search for general terms related to the season to catch any other potentially outdated or relevant information. rg 'season start' rg 'season 2024'Length of output: 642
Verification successful
The script's output confirms that the package's vignettes and examples reflect a focus on advancing women's hockey analytics, as stated in the README.Rmd file. This alignment with the objective to push women's hockey analytics forward is evident in the documentation, verifying the accuracy of the review comment.
* 89-103: The update in author and contributor information to include Twitter handles enhances the visibility and accessibility of contributors. Ensure that all contributors have consented to this change and that their handles are correctly linked.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for vignettes and examples that reflect the focus on women's hockey analytics. fd '.Rmd' --exec grep -i 'women\'s hockey analytics'Length of output: 93
Script:
#!/bin/bash # Corrected script to check for vignettes and examples that reflect the focus on women's hockey analytics. fd '.Rmd' --exec grep -i "women's hockey analytics"Length of output: 909
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 11
Configuration used: CodeRabbit UI
Files selected for processing (3)
- R/pwhl_pbp.R (1 hunks)
- R/pwhl_standings.R (1 hunks)
- R/pwhl_stat_leaders.R (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- R/pwhl_stat_leaders.R
Additional comments: 2
R/pwhl_standings.R (1)
- 107-107: The use of
dplyr::left_join
to mergereg_standings
andstandings
is appropriate. Ensure that the joining keys (team
,team_code
) uniquely identify rows in both data frames to avoid unintended duplication.R/pwhl_pbp.R (1)
- 437-440: The use of
dplyr::select
anddplyr::mutate
to clean and augmentgame_df
is good. However, ensure that thegame_id
parameter is properly passed and used within the function, as it's not explicitly defined within the function scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Configuration used: CodeRabbit UI
Files selected for processing (2)
- R/pwhl_pbp.R (1 hunks)
- R/pwhl_stat_leaders.R (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- R/pwhl_stat_leaders.R
Additional comments: 1
R/pwhl_pbp.R (1)
- 24-25: The approach to manipulate the JSON response using
gsub
to handle JSONP format is acknowledged. Given the specific format of the response, this might be the most straightforward method without introducing additional dependencies. However, remain cautious about potential future changes in the API's response format that could break this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (2)
- R/pwhl_pbp.R (1 hunks)
- R/pwhl_stat_leaders.R (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- R/pwhl_stat_leaders.R
Additional comments: 3
R/pwhl_pbp.R (3)
- 19-19: The URL construction using
glue::glue
is clear and concise, ensuring dynamic insertion of thegame_id
. This approach is efficient for constructing URLs with variable components.- 21-23: The use of
httr::RETRY
for making the HTTP GET request is a good practice for resilience, especially in network operations that might fail due to transient issues. This ensures that the function will retry the request a few times before giving up, which can improve the reliability of the data fetching process.- 433-436: The data frame manipulation using
dplyr
to select columns, mutate the data frame to include thegame_id
, and relocate thegame_id
column is well-implemented. This showcases a good understanding ofdplyr
functions and their use in data manipulation.
added the following functions
pwhl_team_roster
pwhl_teams
pwhl_schedule
they have the game box scores sort of set up, but I'd rather wait for that to start to be populated before trying to write the scraper off the skeleton there.
Summary by CodeRabbit