For speed, we’ve already scraped (using
nflfastR) and saved punting data for the 1999-2020 seasons. The easiest thing to do is download the
puntr-data repo here, and then point
puntr::import_punts() to your local copy of the data. You can also download the data directly each time; this takes around 15 minutes.
Import, clean, and calculate as follows:
#punts_raw <- import_punts(1999:2020, local=TRUE, path=your_local_path) # recommended punts_raw <- import_punts(2018:2020) # This takes ~15 minutes punts_cleaned <- trust_the_process(punts_raw) # clean punts <- calculate_all(punts_cleaned) # calculate custom Puntalytics metrics
## Calculated RERUN: 0.455 sec elapsed ## Calculated SHARP: 4.643 sec elapsed ## Calculated pEPA: 1.137 sec elapsed
You now have a dataframe
punts where each row is a punt, and each column is a stat relevant to punting (including our custom metrics).
puntr calculates stats using 3-year rolling averages, to avoid any artifacts of anomalous seasons. For this reason,
puntr::calculate_all() requires a dataframe containing at least 3 seasons and 1000 punts. Note in the above example that three seasons are used. Note in the below example that three seasons are used for the calculation, after which all but the most recent seasons are filtered out.
The kind folks at
nflfastR have set up a convenient
SQL-y way to scrape data for in-progress seasons. Rather than redo all of that work, we’ll just share here the code we use for this purpose:
#install.packages("DBI") #install.packages("RSQLite") library(DBI) library(RSQLite) library(nflfastR) library(puntr) update_db() connection <- dbConnect(SQLite(), "./pbp_db") pbp <- tbl(connection, "nflfastR_pbp") punts <- pbp %>% filter(punt_attempt==1) %>% filter(season %in% 2019:2021) collect() %>% trust_the_process() %>% calculate_all() %>% filter(season == 2021) dbDisconnect(connection)
Note: New in
puntr 1.3, the functions
are now deprecated, in favor of
To compare punters, use
punters <- by_punters(punts)
to get a dataframe where each row is a punter, and each column is an average stat for that punter. The most common standard and Punt Runts stats are included by default, but you can add whatever you like by passing additional arguments to
dplyr::summarize(). For example:
Let’s take a look at some of the columns in this data frame:
To compare punter seasons, instead use
punter_seasons <- by_punter_seasons(punts)
And finally, to compare punter games, use
punter_games <- by_punter_games(punts)
which gives every unique punter game a row.
Note: If a career, season, or game you’re looking for is missing from your dataframe, try changing the
threshold = parameter to require fewer punts.
These dataframes -
punter_games - should serve as a good starting point for any custom analysis you’d like to do, be that using built-in
puntr metrics, or your own.
puntr was successfully migrated from
cfbfastR in version 1.2.2 NOTE: The
by_ family of summary functions have not yet been tested for
cfbfastR data, but might work.
puntr can also handle punting data for college football, piggybacking off of the scraping abilities of the
cfbfastR package. You need at least 3 seasons worth of data to run
calculate_all(). Import and clean as follows:
college_punts <- import_college_punts(2019:2021) %>% # import (calls cfbfastR behind the scenes) college_to_pro() %>% # rename columns to those used by nflfastR calculate_all() # calculate as with NFL data
Now we can use the same
create_miniY as above to compare college punter seasons (
create_miniG would also work here, of course.)
miniY_college <- create_miniY(college_punts)