Probaballer - Football Analytics & Betting Insights

Mixed Effects Models

When your data has structure — players nested in teams, matches nested in seasons — ignoring it gives wrong answers

30 min read Predictive Models

Why Mixed Effects?

You want to know whether playing at home increases a team's expected goals. You have 10,000 match observations across 20 teams over 10 seasons. A standard linear regression would treat all 10,000 observations as independent — but they're not. Matches from the same team are correlated (Man City's home xG is consistently higher than Burnley's). Matches from the same season share conditions (COVID empty stadiums reduced home advantage league-wide).

Mixed effects models (also called hierarchical linear models, multilevel models, or random effects models) handle this nested structure explicitly. They model both the population-level pattern (fixed effects) and the group-level variation (random effects), giving you correct standard errors, better predictions for small-sample groups, and the ability to quantify variation at each level of the hierarchy.

The Name

"Mixed" because the model contains a mixture of fixed effects (parameters that are the same for everyone — like the effect of being at home) and random effects (parameters that vary by group — like each team's baseline quality). The fixed effects tell you "what's true in general"; the random effects tell you "how does each group deviate from the general pattern".

The Problem with Ignoring Structure

What goes wrong if you just run ordinary linear regression on nested data?

1. Inflated Significance

Standard regression assumes independent errors. With 380 matches per season from 20 teams, you don't have 380 independent observations — you have ~20 clusters of correlated data. Treating them as independent underestimates standard errors by a factor of 2-5×, making everything look "significant" when it isn't.

2. Ecological Fallacy

The relationship between variables can be completely different at different levels. Across teams, higher possession correlates with more goals. But within a team, having 70% vs 55% possession in a specific match might not predict more goals at all — they might just be dominating a weak opponent. Mixed models separate between-group and within-group effects.

3. Poor Predictions for Small Groups

A promoted team has only 10 matches of data. Fitting a separate model gives noisy estimates. Pooling all teams ignores that they're different. Mixed models find the sweet spot: partial pooling — borrow strength from other teams while still allowing the promoted team its own estimate.

The Mathematics

Random Intercept Model

The simplest mixed effects model adds a random intercept for each group. For observation i in group j:

yᵢⱼ = β₀ + β₁xᵢⱼ + u₀ⱼ + εᵢⱼ

where u₀ⱼ ~ N(0, σ²ᵤ) and εᵢⱼ ~ N(0, σ²ₑ)

β₀ (fixed intercept)

Grand mean across all groups — "the average team's baseline"

β₁ (fixed slope)

Effect of x on y, shared by all groups — "the universal relationship"

u₀ⱼ (random intercept)

Group j's deviation from the grand mean — "how team j differs"

σ²ᵤ (between-group variance)

How much groups differ from each other — "how variable are teams?"

Random Slope Model

Sometimes the effect of a predictor varies by group, not just the baseline. Add a random slope:

yᵢⱼ = (β₀ + u₀ⱼ) + (β₁ + u₁ⱼ)xᵢⱼ + εᵢⱼ

[u₀ⱼ, u₁ⱼ] ~ N(0, Σ) — correlated random effects

Σ = [[σ²₀, ρσ₀σ₁], [ρσ₀σ₁, σ²₁]]

Now each team has its own intercept and its own slope. The covariance matrix Σ captures the correlation between intercepts and slopes (e.g., teams with higher baselines might have flatter slopes — ceiling effects).

Football example: The effect of pressing intensity on goals conceded might differ by team. A well-drilled pressing team (Team A) benefits greatly from high intensity, while a team with poor coordination (Team B) concedes more when pressing high. The random slope captures this — β₁ + u₁ₐ is negative (pressing helps) while β₁ + u₁ᵦ might be positive (pressing hurts).

Crossed vs. Nested Random Effects

Football data often has crossed random effects, not just nested ones:

yᵢ = β₀ + β₁xᵢ + u_home(i) + u_away(i) + u_referee(i) + εᵢ

Match i has home team, away team, AND referee — all crossed

In a crossed design, every group at one level can appear with any group at another (any team can face any team, any referee can officiate any match). This is different from nesting (where players belong to exactly one team). Mixed models handle both.

Partial Pooling (The Key Insight)

The central magic of mixed effects models is partial pooling — also called shrinkage. There are three approaches to estimating group-level parameters:

No Pooling

Fit a separate model for each group. Each team gets its own intercept and slope from its own data only. Problem: teams with 10 matches get wildly noisy estimates.

Complete Pooling

Ignore groups entirely — one model for all data. Every team is assumed identical. Problem: ignores real differences between City and Burnley.

Partial Pooling (Mixed Effects) ✓

Each group's estimate is a weighted average of its own data and the global mean. The weight depends on sample size: groups with lots of data keep their own estimate, groups with little data get "shrunk" toward the grand mean. This is automatic regularisation — the model learns how much to trust each group.

The Shrinkage Formula

For a random intercept model, the group estimate is:

û₀ⱼ = (nⱼ / (nⱼ + σ²ₑ/σ²ᵤ)) × (ȳⱼ - β₀)

The shrinkage factor nⱼ / (nⱼ + σ²ₑ/σ²ᵤ) is near 1 when nⱼ is large (trust the group) and near 0 when nⱼ is small (trust the global mean). The ratio σ²ₑ/σ²ᵤ controls the "reliability" — if groups are very different (high σ²ᵤ), less shrinkage occurs.

Estimation Methods

Mixed models can't be fitted with ordinary least squares — the variance components (σ²ᵤ, σ²ₑ) must be estimated alongside the fixed effects. The main approaches:

Maximum Likelihood (ML)

Maximise the marginal likelihood with random effects integrated out. Tends to underestimate variance components because it doesn't account for the degrees of freedom used by fixed effects.

Restricted Maximum Likelihood (REML) ✓

The standard choice. Corrects ML's bias by maximising the likelihood of a transformed set of residuals. Gives unbiased variance estimates. Used by default in R's lme4 and Python's statsmodels.

Bayesian (MCMC)

Place priors on all parameters (fixed effects, variance components) and sample from the posterior via MCMC. Gives full uncertainty quantification — especially valuable when you have few groups (e.g., 5 leagues) where frequentist variance estimates are unreliable. R's brms makes this easy.

How Many Groups Do You Need?

A common rule of thumb: you need at least 5-10 groups to estimate between-group variance reliably with REML. With 3 leagues, the σ²ᵤ estimate will be very uncertain — consider Bayesian estimation with an informative prior, or just use fixed effects for the groups instead.

Model Specification in Practice

Mixed effects models are typically specified using the Wilkinson formula notation. Here's how common football models map to R/Python syntax:

Random intercept by team

xG ~ home + possession + (1 | team)

Each team gets its own baseline xG; home and possession effects are shared.

Random intercept + random slope

xG ~ home + possession + (1 + possession | team)

Each team has its own baseline AND its own possession→xG relationship.

Crossed random effects

goals ~ home + (1 | home_team) + (1 | away_team) + (1 | referee)

Separate random effects for home attack, away defence, and referee bias.

Three-level nested model

passes ~ age + (1 | league/team/player)

Players nested in teams nested in leagues — each level adds its own random intercept.

Intraclass Correlation (ICC)

The ICC tells you how much of the total variance is explained by the grouping structure:

ICC = σ²ᵤ / (σ²ᵤ + σ²ₑ)

ICC ≈ 0

Groups are similar — pooling is fine.

ICC ≈ 0.1–0.3

Moderate clustering — mixed model warranted.

ICC ≈ 0.5+

Strong group effects — ignoring them is dangerous.

Football example: For xG per match, fitting a null model (1 | team) typically gives ICC ≈ 0.25–0.35 — about 30% of the match-to-match variance in xG is explained by which team is playing. This is substantial and absolutely requires a mixed model.

Football Applications

Dixon-Coles as a Mixed Model

The classic goals model is inherently a mixed model: home goals ~ Poisson(λ₁) where log(λ₁) = μ + αᵢ - δⱼ + γ. The attack strengths αᵢ and defence strengths δⱼ are random effects — they're team-specific deviations drawn from a common population. Treating them as random rather than fixed gives partial pooling: newly promoted teams get shrunk toward league average, which produces much better early-season predictions.

Player Rating Models

Separating player ability from team context. A player's per-match xG contribution depends on their ability and the system they play in. Model: xG ~ position + age + (1 | player) + (1 | team). The player random effect captures ability after controlling for team quality. Crucial for transfer valuation — you want the player effect, not the "playing for Man City" effect.

Referee Bias Analysis

Do referees systematically favour home teams? Model: fouls ~ home + (1 | referee) + (1 | team). Crossed random effects let you estimate each referee's home-bias deviation while controlling for teams. Some referees might give 0.3 more fouls to away teams than average; the mixed model quantifies this with proper uncertainty bounds.

Cross-League Comparison

Comparing xG models across leagues. Players and teams are nested within leagues; leagues have different styles. A three-level model (1 | league/team/player) decomposes variance: how much is league-level (style), team-level (quality), and player-level (individual talent)? Essential for scouting players from lower-tier leagues.

Home Advantage Estimation

Home advantage isn't constant — it varies by team (atmosphere), by league (culture), and over time (COVID showed ~40% reduction). Model: goals ~ (1 + home | team) + (1 | season). Random slopes let each team have its own home boost; season random effects capture temporal shifts. You discover that Anfield is worth +0.4 xG while the Amex is +0.1.

Player Development Trajectories

Growth curve modelling: performance ~ age + age² + (1 + age | player). The fixed effects give the average career arc (peak at ~27). Random slopes let each player have their own trajectory — some peak at 24, others at 30. Predict future performance by extrapolating each player's random slope. Better than one-size-fits-all aging curves.

Generalised Linear Mixed Models (GLMMs)

Most football outcomes aren't normally distributed. Goals are counts (Poisson), shots on target are yes/no (Bernoulli), pass completion is a proportion (Binomial). GLMMs combine mixed effects with non-normal distributions via a link function:

g(E[yᵢⱼ]) = β₀ + β₁xᵢⱼ + u₀ⱼ

g = link function (log for Poisson, logit for Binomial)

Poisson GLMM

Goals ~ Poisson. log(λᵢⱼ) = β₀ + β₁·home + u₀ⱼ. The Dixon-Coles model with partial pooling.

Logistic GLMM

Shot conversion ~ Bernoulli. logit(pᵢⱼ) = β₀ + β₁·distance + u₀ⱼ. xG model with player random effects.

Binomial GLMM

Pass completion (k/n). logit(pᵢⱼ) = β₀ + β₁·pressure + u₀ⱼ. Pass accuracy model per player.

Negative Binomial GLMM

Shots (overdispersed counts). Handles teams with high variance in shot volume.

Practical Tips

Start Simple

Begin with a random intercept model. Only add random slopes if there's theoretical reason to expect the effect varies by group and you have enough groups (≥ 20). Complex random effects structures often fail to converge with football-sized datasets.

Check Convergence

Mixed models are optimised iteratively. Watch for convergence warnings in lme4 — singular fits (variance component estimated at exactly 0) usually mean the random effect structure is too complex for the data. Simplify by removing correlations or random slopes.

Fixed vs. Random: The Decision Rule

Treat a grouping variable as a random effect when: (1) the groups are a sample from a larger population you want to generalise to, (2) you have many groups (≥ 5), and (3) you care about the distribution of group effects, not each specific group. If you have 3 specific leagues and only care about those 3, use fixed effects.

Tooling

R: lme4::lmer() for linear, lme4::glmer() for GLMMs, brms::brm() for Bayesian. Python: statsmodels.MixedLM, pymer4 (R's lme4 from Python), bambi (Bayesian via PyMC).

Summary

Key Concepts

Fixed effects: what's true in general (shared parameters)
Random effects: how groups deviate (group-specific parameters)
Partial pooling shrinks small-sample groups toward the mean
ICC measures how much variance is between groups
Crossed random effects handle multi-factor designs
GLMMs extend to non-normal outcomes (goals, shots, passes)

Key Equations

yᵢⱼ = (β₀ + u₀ⱼ) + (β₁ + u₁ⱼ)xᵢⱼ + εᵢⱼ

u₀ⱼ ~ N(0, σ²ᵤ), εᵢⱼ ~ N(0, σ²ₑ)

ICC = σ²ᵤ / (σ²ᵤ + σ²ₑ)

shrinkage = nⱼ / (nⱼ + σ²ₑ/σ²ᵤ)

g(E[y]) = Xβ + Zu (GLMM general form)

Key Takeaway

Football data is almost always hierarchical — players in teams, matches in seasons, teams in leagues. Standard regression ignores this structure and gives overconfident conclusions. Mixed effects models are the principled solution: they give you correct standard errors, automatic regularisation through partial pooling, and the ability to decompose variance at every level. If you're fitting any model to football data with repeated measurements on teams or players, you should probably be using mixed effects.