Why Time Series Matters in Football
Football data is fundamentally temporal. A team's strength changes over a season. A player's form rises and falls. Injuries accumulate. Tactics evolve. Standard regression treats every observation as independent — it can't capture the trajectory of performance, only a static snapshot.
Time series forecasting is the discipline of modelling sequential dependencies: how today's value depends on yesterday's, last week's, and last season's. It answers questions regression can't: "Is this team improving or declining?" "Will their xG trend continue?" "How quickly should we update our estimate after a shock result?"
A time series is a sequence of observations indexed by time: y₁, y₂, ..., yₜ. We assume observations near each other in time are correlated — and we exploit that correlation to forecast yₜ₊₁, yₜ₊₂, .... The key insight is that the order matters: shuffling the data destroys the information.
Time Series Decomposition
The first step in any time series analysis is decomposition — breaking the series into interpretable components. The classical additive decomposition is:
Long-term direction. Is the team's xG per match rising or falling over the season? Driven by transfers, injuries, tactical changes.
Repeating patterns at fixed intervals. Fixture congestion cycles, mid-season dip, end-of-season dead rubbers. Can be weekly or annual.
What's left — random variation, one-off shocks (red cards, freak results), and any structure the model didn't capture.
For football, multiplicative decomposition (yₜ = Tₜ × Sₜ × Rₜ) sometimes fits better — when the seasonal swing scales with the level (a top team's form fluctuations are larger in absolute terms). Use STL (Seasonal and Trend decomposition using Loess) for robust estimation.
Stationarity & Autocorrelation
Most time series models assume stationarity — the statistical properties (mean, variance, autocorrelation) don't change over time. Raw football data is rarely stationary: teams get better/worse, leagues evolve. The fix is differencing: instead of modelling yₜ directly, model Δyₜ = yₜ − yₜ₋₁.
The autocorrelation function (ACF) measures how correlated the series is with lagged versions of itself. In football, team form has strong positive autocorrelation at short lags — a team that performed well last week is likely to perform well this week. The ACF decaying over lags tells us how quickly the "memory" fades.
Correlation between yₜ and yₜ₋ₖ. Decays slowly for trending data, cuts off sharply for MA processes. Used to identify MA order.
Correlation between yₜ and yₜ₋ₖ after removing the effect of intermediate lags. Cuts off sharply for AR processes. Used to identify AR order.
ARIMA Models
ARIMA(p, d, q) is the workhorse of classical time series forecasting. It combines three ideas:
Today's value is a weighted sum of past values. This week's xG depends on last week's and the week before. An AR(1) with φ = 0.7 means 70% of this week's deviation from the mean carries over from last week — strong form persistence.
The differencing order. d = 1 means we model the changes rather than the levels. Most football series need d = 0 or d = 1. If you need d = 2, something is probably wrong with your data.
Today's value depends on past forecast errors. A shock result (5-0 loss) produces a large εₜ that influences the next few predictions. MA(1) means only the last shock matters; MA(2) means the last two.
Use the ADF test to determine d. Use the ACF/PACF plots to guide p and q. Or let auto.arima() (R) or pmdarima.auto_arima() (Python) search over (p, d, q) combinations using AIC. In practice, football series are short (38 match-weeks) — keep p, q ≤ 2 to avoid overfitting.
SARIMA: Adding Seasonality
SARIMA(p,d,q)(P,D,Q)[s] extends ARIMA with seasonal AR, I, and MA terms at lag s. For football with annual cycles, s = 38 (match-weeks per season) — but this requires multiple seasons of data. For within-season patterns with fixture congestion, s = 4 or s = 6 might capture mid-week/weekend rhythms.
Exponential Smoothing (ETS)
While ARIMA models autocorrelation in residuals, exponential smoothing models the series through weighted averages of past observations, with exponentially decaying weights. Newer observations matter more.
This is exactly how Elo ratings work in football! The Elo update R_new = R_old + K(S - E) is an exponential smoothing filter where K plays the role of α. A higher K means faster adaptation to recent results (but noisier estimates).
Simple exponential smoothing. No trend, no seasonality. Good for stable team metrics like pass completion rate.
Holt's method. Adds a linear trend component. Good for teams on an upward or downward trajectory mid-season.
Damped trend. Trend flattens over time — realistic for football (no team improves forever). Often the best default.
ETS and ARIMA are complementary, not competing. ETS(A,N,N) is equivalent to ARIMA(0,1,1). But they differ in how they handle trends and seasonality. ETS is often better for short series with clear level shifts. ARIMA is better for series with complex autocorrelation structures. Use both and compare via cross-validation.
Modern Approaches
Decomposable model: y(t) = g(t) + s(t) + h(t) + ε where g is a piecewise-linear or logistic growth trend, s is Fourier seasonality, and h captures holidays/events. Football use: model league-wide goal trends with known structural breaks (transfer windows, COVID, rule changes).
Multivariate extension of AR — model multiple time series simultaneously. Football use: jointly forecast xG, possession, and PPDA for a team, capturing cross-correlations (e.g. higher pressing → higher xG next week). yₜ = c + A₁yₜ₋₁ + A₂yₜ₋₂ + εₜ where yₜ is a vector.
A general framework that includes ETS, ARIMA, and structural time series as special cases. The idea: an unobserved state (true team strength) evolves over time, and we see noisy observations (match results). Kalman filters and particle filters estimate the hidden state recursively. This is exactly how Bayesian Elo and dynamic team ratings work.
LSTM/GRU networks learn arbitrary nonlinear temporal dependencies. Temporal Fusion Transformers (TFT) add attention over time steps + static covariates. N-BEATS stacks fully-connected blocks for pure time series. These need much more data than 38 match-weeks — better suited for player-level tracking data (thousands of possessions) or multi-league forecasting.
Evaluation & Cross-Validation
Standard k-fold cross-validation is invalid for time series — it leaks future information into the training set. Instead, use expanding window (or sliding window) cross-validation: train on [1, ..., t], test on [t+1, ..., t+h], then expand.
Error Metrics
(1/n) Σ|yₜ - ŷₜ|
Mean absolute error. Interpretable in original units (e.g. "off by 0.3 xG per match"). Robust to outliers.
√((1/n) Σ(yₜ - ŷₜ)²)
Penalises large errors more. Use when big misses matter (e.g. predicting match outcomes for betting).
(100/n) Σ|yₜ - ŷₜ|/|yₜ|
Percentage error. Undefined when yₜ = 0 (common in football — 0 goals!). Use MASE instead.
MAE / MAE_naive
Scaled against the naïve forecast (yₜ₊₁ = yₜ). MASE < 1 means you beat the naïve baseline. The gold standard for time series.
Always compare against the naïve forecast: ŷₜ₊₁ = yₜ (last observation carried forward) and the seasonal naïve: ŷₜ₊₁ = yₜ₋ₛ (same period last season). In football, the naïve baseline is surprisingly hard to beat for short horizons — form is noisy and mean-reverting. If your fancy model can't beat "predict last week's value", it's not useful.
Football Applications
Elo is exponential smoothing for team strength. Modern variants (FiveThirtyEight's SPI, ClubElo) add mean reversion between seasons, home advantage adjustment, and margin-of-victory weighting. Mathematically equivalent to a state-space model with xₜ = xₜ₋₁ + wₜ (random walk state) and yₜ = f(xₜ) + vₜ (observed result).
Track a team's rolling xG per match over a season. Fit ARIMA or ETS to predict whether their attacking output will improve or decline. Useful for transfer scouting: is this team creating chances because of a hot streak (mean-reverting) or structural tactical change (trending)?
Model individual player metrics (goals, assists, key passes per 90) as time series. Exponential smoothing detects when a player enters a hot streak or declines. The smoothing parameter α balances recency vs stability — higher α for volatile strikers, lower for consistent midfielders.
Player workload (distance, sprints, high-intensity minutes) forms a time series. Acute-to-chronic workload ratios use exponentially weighted moving averages to flag injury risk. When the short-term load spikes relative to the long-term average, injury probability increases — a direct time series application.
Odds move over time as information arrives. Model the evolution of implied probabilities as a time series to detect when the market over-reacts to news (mean-reversion opportunity) or under-reacts (momentum). VAR models can jointly track odds movement across multiple bookmakers.
Forecast remaining match outcomes using team-strength time series, then simulate the rest of the season thousands of times. This gives probabilistic league table projections: P(team wins league), P(relegation), etc. The key is using time-varying team strengths, not static estimates from the full season.
Common Pitfalls
The deadliest sin. If any information from time t+1 leaks into your model at time t, your backtest will look amazing but your live predictions will fail. This includes using future data for feature scaling, imputation, or even train/test splitting. Always split temporally.
A Premier League season has 38 match-weeks — that's 38 data points per team. ARIMA(2,1,2) has 5 parameters. A neural network has thousands. With 38 points, stick to simple models (ETS, ARIMA with low order) or use cross-league pooling to increase sample size.
A model that predicts next week well might be terrible at predicting 10 weeks ahead. Always evaluate at the horizon you care about. For betting, h = 1 (next match). For squad planning, h = 10-20. Forecast uncertainty grows with horizon — confidence intervals widen.
A new manager, a key transfer, a red card — these are structural breaks that invalidate the history. Standard ARIMA doesn't handle this. Solutions: use change-point detection (PELT, BOCPD), reset the model after known breaks, or use Bayesian online methods that adapt quickly.
Practical Tips
Before trying anything fancy, compute the naïve (last value) and seasonal naïve (same matchweek last season) forecasts. Report MASE relative to these. If your model can't beat MASE = 1.0, use the naïve forecast instead.
Goals are highly variable — a team can score 0 with 3.0 xG. Time series models on raw goals will be noisy and slow to detect real changes. Model xG (or xGA, or xGD) for smoother, more predictable series. The signal-to-noise ratio is much better.
Instead of fitting 20 separate models (one per team), fit a hierarchical model where teams share hyperparameters (e.g. a common AR coefficient or smoothing rate). This borrows strength across teams and works much better with short series. See the mixed effects models article.
R: forecast package (auto.arima, ets), fable (tidyverse-native). Python: statsmodels (ARIMA, ETS), pmdarima (auto_arima), prophet, darts (unified interface for classical + neural).
Summary
- Decomposition: trend + seasonality + residual
- Stationarity: difference until ADF rejects
- ACF/PACF identify AR and MA orders
- ARIMA(p,d,q) for autocorrelated series
- ETS for level/trend/seasonal smoothing
- Expanding-window CV, never k-fold
- MASE for scale-free comparison
Football is a time series problem disguised as a cross-sectional one. Every rating system, every form metric, every "rolling average" is implicitly a time series model. Making that explicit — choosing the right smoothing, testing for stationarity, validating with proper temporal CV — separates rigorous analysis from ad-hoc number crunching. Start with exponential smoothing (it's Elo!), graduate to ARIMA when you need confidence intervals, and reach for neural forecasters only when you have enough data to justify the complexity.