CRAN Task View: Sports Analytics
This CRAN Task View contains a list of packages useful for sports analytics. Most of the packages are sport-specific and are grouped as such. However, we also include a General section for packages that provide ancillary functionality relevant to sports analytics (e.g., team-themed color palettes), and a Modeling section for packages useful for statistical modeling. Throughout the task view, and collected in the Related links section at the end, we have included a list of selected books and articles that use some of these packages in substantive ways. Our goal in compiling this list is to help researchers find the tools they need to complete their work in R.
To be considered for inclusion, the package must be useful for conducting sports analytics. Most packages provide functionality for some combination of:
- acquiring data for a specific sport or league
- performing common computations on sport-specific data
Esports and sports betting packages are within scope.
The list of packages is aspirationally comprehensive. If there is a sports analytics package on CRAN that we have missed, please let us know. Contributions are always welcome, and encouraged – please see the linked GitHub repository for details.
Sport-Specific Packages
- nflverse is a collection of packages for obtaining and analyzing NFL data. The core nflverse includes nflfastR, nflseedR, nfl4th, nflreadr, and nflplotR.
- nflfastR contains functions to efficiently scrape NFL play-by-play data from 1999 to present. It is similar to nflscrapR, but much faster. All models required by nflfastR are hosted in fastrmodels.
- nflreadr efficiently downloads data from GitHub repositories of the nflverse project, including pre-computed nflfastR data frames.
- nfl4th consists of functions to calculate optimal Fourth Down decisions in the National Football League. Data on 4th downs is collected from NFL and ESPN.
- nflseedR contains functions for ranking NFL teams based on the complex NFL tie breaking rules. It includes division ranking, playoff seeding, and draft order.
- nflplotR includes functions for making NFL data visualization in ggplot2 easier.
- NFLSimulatoR consists of tools for simulating plays and drives, and furthermore evaluating in-game strategies in the NFL.
- ffscrapr helps access various fantasy football APIs including MFL, Sleeper, ESPN, and Fleaflicker with a consistent interface and built-in authentication, rate-limiting, and caching.
- gsisdecoder contains functions to decode NFL Player IDs for use in conjunction with the nflfastR package.
- worldfootballR provides clean and tidy football data from a number of popular sites, including FBref, transfer and valuations data from Transfermarkt and shooting location data from Understat and fotmob.
- socceR provides functions for evaluating soccer predictions and simulating results from soccer matches and tournament.
- ggsoccer provides functions for visualizing soccer event data in ggplot2.
- footballpenaltiesBL contains data and plotting functions for analyzing penalty kicks in the German Men’s Bundesliga from 1963-64 to 2016-17.
- footBayes consists of functions for fitting widely known soccer models (double Poisson, bivariate Poisson, Skellam, Student’s t) through Hamiltonian Monte Carlo and Maximum Likelihood estimation approaches using Stan. The package also provides tools for visualizing team strengths and predicting match outcomes.
- itscalledsoccer enables access to American soccer (MLS, NWSL, and USL) data through the American Soccer Analysis app API.
- FPLdata contains functions for retrieving player attributes on Fantasy Premier League.
- EUfootball provides European football match results for top leagues in England, France, Germany, Italy, Spain, Netherlands, and Turkey from 2010-2011 to 2019-2020.
- bundesligR contains all final standings of the Bundesliga in Germany from 1964 to 2016.
Baseball ⚾
- Historical baseball data is available through the Lahman package, which contains season-level data for Major League Baseball going back to 1871.
- baseballr consists of functions for extracting and analyzing baseball data from various sources such as Baseball Reference, FanGraphs, and Baseball Savant. The package is featured prominently in the 3rd edition of Albert, J., Marchi, M., and Baumer, B. S. (2024). Analyzing Baseball Data with R (doi:10.1201/9781351107099) and largely replaces the now-defunct
pitchRx
package.
- retrosheet facilitates downloading game log, team IDs, rosters, and play-by-play and other files from Retrosheet.org, and returning the results as data frames. Local caching can be employed to improve efficiency. Note that the play-by-play data returned comes directly from the event files and is not parsed (i.e., Chadwick is not bundled).
- mlbstats provides functions for vector-based computation of many baseball statistics, both traditional and sabermetric.
- mlbplotR contains tools for visualizing MLB analysis with ggplot2 and gt.
Basketball 🏀
- BAwiR is a collection of tools to analyze basketball data, with focus on data scraping and visualization.
- AdvancedBasketballStats provides functions to calculate and analyze basketball statistics for players, teams, lineups (quintets), and plays.
- uncmbb contains data on University of North Carolina (at Chapel Hill) Men’s Basketball Results since the 1949-50 season.
- BasketballAnalyzeR accompanies the book Basketball Data Science With Applications in R. This package includes data and functions to analyze and visualize basketball data.
- NBAloveR is an R interface to access basketball data from Basketball Reference API. This package also contains functions to help users with analyzing basketball data.
- wehoop provides functions for accessing women’s college basketball and WNBA data from the ESPN API.
- hoopR consists of functions for accessing men’s college basketball and NBA data from various sources, including ESPN, NBA Stats API, and Ken Pomeroy’s college basketball ratings.
- euroleaguer provides functions for retrieving data from EuroLeague and EuroCup API.
- nblR allows users to obtain play-by-play, shooting locations, and box score statistics for the National Basketball League in Australia.
Chess ♟
- chess is an opinionated wrapper for R around python-chess. It reads and writes PGN files and SVGs of game boards.
- Like chess, bigchess reads and writes PGN files. bigchess provides an API to the UCI chess engines. bigchess is also able to read multiple game files at once without copying to RAM.
- ChessGmooG contains FIDE Ratings for chess players for 2015 and 2020.
Cricket 🏏
- yorkr provides functions for analyzing statistics of cricket players and teams based on Cricsheet data.
- cricketr is a collection of tools for analyzing cricket performances of players and teams based on ESPN Cricinfo Statsguru data.
- cricketdata includes functions to obtain international cricket data from two major sources, ESPNCricinfo and Cricsheet.
- howzatR consists of functions for calculating various cricket statistics.
Esports 🎮
GPS Tracking 📍
- trackeR and trackeRapp provide tools for analyzing running, cycling and swimming data from GPS-enabled tracking devices within R. These two packages allow users to tidy and explore data from workouts and competitions.
- rStrava contains functions to access Strava activity data from the Strava API.
- A detailed overview of tools for processing and analyzing tracking data can be found in the Tracking CRAN Task View.
Hockey 🏒
- NHLData contains scores from NHL games dating back to 1917. Data are stored one season at a time and contains scores for every game during a particular season.
- Access to data exposed by the NHL API is provided by the nhlapi package.
- fastRhockey provides API wrappers for the NHL and Premier Hockey Federation (PHF), formerly known as the National Women’s Hockey League (NWHL).
Softball 🥎
- runexp provides methods for estimating runs scored in softball. In particular, runexp centers around theoretical expectation using discrete Markov chains and empirical distribution using multinomial random simulation.
Swimming 🏊
- SwimmeR reads swimming results in a variety of formats and returns results in tidy data frame. It also includes functions for converting times between short-course yards (SCY), short-course meters (SCM), and long-course meters (LCM).
Track and Field 🏃
Volleyball 🏐
General
- teamcolors provides color palettes, ggplot2 themes, xaringan themes, and logos for professional teams across a variety of sports and leagues. teamcolors was originally designed to create the data graphics in Lopez, et al. (2018) (doi:10.1214/18-AOAS1165).
- colorr contains color palettes for professional sports teams in the EPL, MLB, NBA, NHL, and NFL.
- nbapalettes contains color palettes inspired by NBA team jersey colors.
- gameR contains color palettes inspired by video games.
- sleeperapi offers functions for gathering data from the Sleeper API for fantasy sports.
- sportyR contains functions for creating ggplot2 representations of sports playing surfaces pursuant to rule-book specifications. This is particularly useful for plotting player tracking data.
- SportsTour provides functions for displaying tournament fixtures using knock-out and round robin methods.
- injurytools provides functionality for analyzing, visualizing, and modeling sports injuries.
- ISAR contains datasets used in the textbook Introduction to Sports Analytics using R.
Modeling
A wide array of functions for modeling in sports analytics are available in the R base package (e.g. lm()
and glm()
). In addition, other CRAN Task Views such as Bayesian, MachineLearning, Robust, Spatial, and SpatioTemporal may contain appropriate packages for applying statistical methods to sports.
Betting
- oddsapiR provides tools for accessing sports odds from The Odds API.
- odds.converter contributes functions for converting common sports betting odds types, including US odds, Hong Kong odds, Decimal odds, Indonesian odds, Malaysian odds, and raw probability.
- implied is a collection of functions that convert between bookmaker odds and probabilities, based on various algorithms.
- pinnacle.data contains Pinnacle market odds, highlighted by a dataset of all wagering lines for the 2016 MLB season.
- RKelly computes the Kelly criterion for betting and provides functions to calculate outcome probabilities for multi-leg contests.
Ratings
CRAN packages
Core: | baseballr, BAwiR, BradleyTerry2, Lahman, nflverse. |
Regular: | AdvancedBasketballStats, BasketballAnalyzeR, bigchess, bundesligR, chess, ChessGmooG, colorr, combinedevents, cricketdata, cricketr, CSGo, elo, EloChoice, EloOptimized, EloRating, EUfootball, euroleaguer, f1dataR, fastRhockey, fastrmodels, ffscrapr, fitzRoy, footballpenaltiesBL, footBayes, FPLdata, gameR, ggsoccer, gsisdecoder, hoopR, howzatR, implied, injurytools, ISAR, itscalledsoccer, mlbplotR, mlbstats, mvglmmRank, NBAloveR, nbapalettes, nblR, nfl4th, nflfastR, nflplotR, nflreadr, nflseedR, NFLSimulatoR, nhlapi, NHLData, odds.converter, oddsapiR, opendotaR, pinnacle.data, piratings, PlayerRatings, rbedrock, RDota2, retrosheet, RKelly, ROpenDota, rStrava, runexp, sleeperapi, socceR, SportsTour, sportyR, SwimmeR, teamcolors, trackeR, trackeRapp, uncmbb, volleystat, wehoop, welo, worldfootballR, yorkr. |
Related links
- Lopez, M. J., Matthews, G. J., and Baumer, B. S. (2018). How often does the best team win? A unified approach to understanding randomness in North American sport. The Annals of Applied Statistics, 12(4), 2483-2516.
- Constantinou, A. C., Fenton, N. E., and Neil, M. (2013). Profiting from an inefficient Association Football gambling market: Prediction, Risk and Uncertainty using Bayesian networks. Knowledge-Based Systems, 50, 60-86.
- Zuccolotto, P., and Manisera, M. (2020). Basketball data science: with applications in R. CRC Press.
- Marchi, M., Albert, J., and Baumer, B. S. (2018). Analyzing baseball data with R. 2nd edition. Chapman and Hall/CRC.
- Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4), 324-345.
Other resources