PHF & NHL Data
This repository holds historical boxscore and play-by-play data for the
Premier Hockey Federation (PHF, formerly known as NWHL), which was
compiled with the
fastRhockey
package
from GitHub.
You can find
fastRhockey
here:
BenHowell71/fastRhockey
The scraper was created to increase access to play-by-play and boxscore data for the PHF, which has historically been one of the bigger barriers to entry regarding women’s hockey analytics.
This repo contains three main CSVs of data, each of which is outlined in a little more detail below.
phf_meta_data.csv
: this csv contains all the data that you’d want on an individual game in one row. Contains home/away teams, arena information, game IDs, league IDs, and moreboxscore.csv
: this csv contains all the boxscore information from the PHF for the games inphf_meta_data.csv
. Contains data on game ID, scoring by period, shots be period, power play numbers, and more, all broken down by each team involved in a gameplay_by_play.csv
: this csv contains all the play-by-play data from the PHF. It includes information on events, how many skaters were on the ice, penalties, shots, etc. This data is essentially complete for the more recent PHF seasons, while it is spottier, usually just goals and penalties, from the early seasons of the league
The best way to get familiar with this data is to use it! You can either
download directly from this repo or use
fastRhockey
to
scrape the data yourself.
Follow SportsDataverse on Twitter and star fastRhockey
To cite the
fastRhockey
R
package in publications, use:
BibTex Citation
@misc{howell_fastRhockey_2021,
author = {Ben Howell},
title = {fastRhockey: The SportsDataverse's R Package for Women's Hockey Data.},
url = {https://benhowell71.github.io/fastRhockey/},
year = {2021}
}