Why Mixed Effects?
You want to know whether playing at home increases a team's expected goals. You have 10,000 match observations across 20 teams over 10 seasons. A standard linear regression would treat all 10,000 observations as independent — but they're not. Matches from the same team are correlated (Man City's home xG is consistently higher than Burnley's). Matches from the same season share conditions (COVID empty stadiums reduced home advantage league-wide).
Mixed effects models (also called hierarchical linear models, multilevel models, or random effects models) handle this nested structure explicitly. They model both the population-level pattern (fixed effects) and the group-level variation (random effects), giving you correct standard errors, better predictions for small-sample groups, and the ability to quantify variation at each level of the hierarchy.
"Mixed" because the model contains a mixture of fixed effects (parameters that are the same for everyone — like the effect of being at home) and random effects (parameters that vary by group — like each team's baseline quality). The fixed effects tell you "what's true in general"; the random effects tell you "how does each group deviate from the general pattern".
The Problem with Ignoring Structure
What goes wrong if you just run ordinary linear regression on nested data?
Standard regression assumes independent errors. With 380 matches per season from 20 teams, you don't have 380 independent observations — you have ~20 clusters of correlated data. Treating them as independent underestimates standard errors by a factor of 2-5×, making everything look "significant" when it isn't.
The relationship between variables can be completely different at different levels. Across teams, higher possession correlates with more goals. But within a team, having 70% vs 55% possession in a specific match might not predict more goals at all — they might just be dominating a weak opponent. Mixed models separate between-group and within-group effects.
A promoted team has only 10 matches of data. Fitting a separate model gives noisy estimates. Pooling all teams ignores that they're different. Mixed models find the sweet spot: partial pooling — borrow strength from other teams while still allowing the promoted team its own estimate.
The Mathematics
Random Intercept Model
The simplest mixed effects model adds a random intercept for each group. For observation i in group j:
Grand mean across all groups — "the average team's baseline"
Effect of x on y, shared by all groups — "the universal relationship"
Group j's deviation from the grand mean — "how team j differs"
How much groups differ from each other — "how variable are teams?"
Random Slope Model
Sometimes the effect of a predictor varies by group, not just the baseline. Add a random slope:
Now each team has its own intercept and its own slope. The covariance matrix Σ captures the correlation between intercepts and slopes (e.g., teams with higher baselines might have flatter slopes — ceiling effects).
Crossed vs. Nested Random Effects
Football data often has crossed random effects, not just nested ones:
In a crossed design, every group at one level can appear with any group at another (any team can face any team, any referee can officiate any match). This is different from nesting (where players belong to exactly one team). Mixed models handle both.
Partial Pooling (The Key Insight)
The central magic of mixed effects models is partial pooling — also called shrinkage. There are three approaches to estimating group-level parameters:
Fit a separate model for each group. Each team gets its own intercept and slope from its own data only. Problem: teams with 10 matches get wildly noisy estimates.
Ignore groups entirely — one model for all data. Every team is assumed identical. Problem: ignores real differences between City and Burnley.
Each group's estimate is a weighted average of its own data and the global mean. The weight depends on sample size: groups with lots of data keep their own estimate, groups with little data get "shrunk" toward the grand mean. This is automatic regularisation — the model learns how much to trust each group.
For a random intercept model, the group estimate is:
The shrinkage factor nⱼ / (nⱼ + σ²ₑ/σ²ᵤ) is near 1 when nⱼ is large (trust the group) and near 0 when nⱼ is small (trust the global mean). The ratio σ²ₑ/σ²ᵤ controls the "reliability" — if groups are very different (high σ²ᵤ), less shrinkage occurs.
Estimation Methods
Mixed models can't be fitted with ordinary least squares — the variance components (σ²ᵤ, σ²ₑ) must be estimated alongside the fixed effects. The main approaches:
Maximise the marginal likelihood with random effects integrated out. Tends to underestimate variance components because it doesn't account for the degrees of freedom used by fixed effects.
The standard choice. Corrects ML's bias by maximising the likelihood of a transformed set of residuals. Gives unbiased variance estimates. Used by default in R's lme4 and Python's statsmodels.
Place priors on all parameters (fixed effects, variance components) and sample from the posterior via MCMC. Gives full uncertainty quantification — especially valuable when you have few groups (e.g., 5 leagues) where frequentist variance estimates are unreliable. R's brms makes this easy.
A common rule of thumb: you need at least 5-10 groups to estimate between-group variance reliably with REML. With 3 leagues, the σ²ᵤ estimate will be very uncertain — consider Bayesian estimation with an informative prior, or just use fixed effects for the groups instead.
Model Specification in Practice
Mixed effects models are typically specified using the Wilkinson formula notation. Here's how common football models map to R/Python syntax:
Each team gets its own baseline xG; home and possession effects are shared.
Each team has its own baseline AND its own possession→xG relationship.
Separate random effects for home attack, away defence, and referee bias.
Players nested in teams nested in leagues — each level adds its own random intercept.
Intraclass Correlation (ICC)
The ICC tells you how much of the total variance is explained by the grouping structure:
Groups are similar — pooling is fine.
Moderate clustering — mixed model warranted.
Strong group effects — ignoring them is dangerous.
Football Applications
The classic goals model is inherently a mixed model: home goals ~ Poisson(λ₁) where log(λ₁) = μ + αᵢ - δⱼ + γ. The attack strengths αᵢ and defence strengths δⱼ are random effects — they're team-specific deviations drawn from a common population. Treating them as random rather than fixed gives partial pooling: newly promoted teams get shrunk toward league average, which produces much better early-season predictions.
Separating player ability from team context. A player's per-match xG contribution depends on their ability and the system they play in. Model: xG ~ position + age + (1 | player) + (1 | team). The player random effect captures ability after controlling for team quality. Crucial for transfer valuation — you want the player effect, not the "playing for Man City" effect.
Do referees systematically favour home teams? Model: fouls ~ home + (1 | referee) + (1 | team). Crossed random effects let you estimate each referee's home-bias deviation while controlling for teams. Some referees might give 0.3 more fouls to away teams than average; the mixed model quantifies this with proper uncertainty bounds.
Comparing xG models across leagues. Players and teams are nested within leagues; leagues have different styles. A three-level model (1 | league/team/player) decomposes variance: how much is league-level (style), team-level (quality), and player-level (individual talent)? Essential for scouting players from lower-tier leagues.
Home advantage isn't constant — it varies by team (atmosphere), by league (culture), and over time (COVID showed ~40% reduction). Model: goals ~ (1 + home | team) + (1 | season). Random slopes let each team have its own home boost; season random effects capture temporal shifts. You discover that Anfield is worth +0.4 xG while the Amex is +0.1.
Growth curve modelling: performance ~ age + age² + (1 + age | player). The fixed effects give the average career arc (peak at ~27). Random slopes let each player have their own trajectory — some peak at 24, others at 30. Predict future performance by extrapolating each player's random slope. Better than one-size-fits-all aging curves.
Generalised Linear Mixed Models (GLMMs)
Most football outcomes aren't normally distributed. Goals are counts (Poisson), shots on target are yes/no (Bernoulli), pass completion is a proportion (Binomial). GLMMs combine mixed effects with non-normal distributions via a link function:
Goals ~ Poisson. log(λᵢⱼ) = β₀ + β₁·home + u₀ⱼ. The Dixon-Coles model with partial pooling.
Shot conversion ~ Bernoulli. logit(pᵢⱼ) = β₀ + β₁·distance + u₀ⱼ. xG model with player random effects.
Pass completion (k/n). logit(pᵢⱼ) = β₀ + β₁·pressure + u₀ⱼ. Pass accuracy model per player.
Shots (overdispersed counts). Handles teams with high variance in shot volume.
Practical Tips
Begin with a random intercept model. Only add random slopes if there's theoretical reason to expect the effect varies by group and you have enough groups (≥ 20). Complex random effects structures often fail to converge with football-sized datasets.
Mixed models are optimised iteratively. Watch for convergence warnings in lme4 — singular fits (variance component estimated at exactly 0) usually mean the random effect structure is too complex for the data. Simplify by removing correlations or random slopes.
Treat a grouping variable as a random effect when: (1) the groups are a sample from a larger population you want to generalise to, (2) you have many groups (≥ 5), and (3) you care about the distribution of group effects, not each specific group. If you have 3 specific leagues and only care about those 3, use fixed effects.
R: lme4::lmer() for linear, lme4::glmer() for GLMMs, brms::brm() for Bayesian. Python: statsmodels.MixedLM, pymer4 (R's lme4 from Python), bambi (Bayesian via PyMC).
Summary
- Fixed effects: what's true in general (shared parameters)
- Random effects: how groups deviate (group-specific parameters)
- Partial pooling shrinks small-sample groups toward the mean
- ICC measures how much variance is between groups
- Crossed random effects handle multi-factor designs
- GLMMs extend to non-normal outcomes (goals, shots, passes)
Football data is almost always hierarchical — players in teams, matches in seasons, teams in leagues. Standard regression ignores this structure and gives overconfident conclusions. Mixed effects models are the principled solution: they give you correct standard errors, automatic regularisation through partial pooling, and the ability to decompose variance at every level. If you're fitting any model to football data with repeated measurements on teams or players, you should probably be using mixed effects.