💷📊
Monte Carlo Simulations
When you can't solve it analytically, simulate it 10,000 times and count
30 min read Predictive Models

Why Monte Carlo?

Imagine you want to know the probability that Arsenal win the Premier League. You can't solve this with a formula — there are 380 matches, each with uncertain outcomes that depend on team strength, form, injuries, schedule congestion, and luck. The space of possible seasons is astronomically large.

Monte Carlo simulation sidesteps the impossibility of exact calculation with a beautifully simple idea: if you can simulate the process, you can estimate the answer. Generate thousands of random seasons, count how often Arsenal finish top, and that fraction is your probability estimate.

The Name

Named after the Monte Carlo Casino in Monaco. The connection is randomness: just as a roulette wheel generates random outcomes, Monte Carlo methods use random sampling to explore complex spaces. The name was coined by Stanislaw Ulam and John von Neumann during the Manhattan Project, where they used random sampling to solve neutron diffusion problems too complex for analytical solutions.

The power of Monte Carlo is its universality. It works for any problem where you can: (1) model the random process, and (2) run it many times. No closed-form solution needed. No assumptions about distributions being normal. No linearisation of nonlinear systems. Just simulate and count.

The Core Idea

Every Monte Carlo method follows the same pattern:

1.Define the model — what random variables are involved and how they interact
2.Sample — draw random values from the model's distributions
3.Compute — run the simulation forward and record the outcome
4.Aggregate — repeat N times, then compute statistics over the outcomes
Monte Carlo Estimation of π200 random points → π ≈ 4 × (inside / total)The Core Idea1.Define a random process (throw darts at a square)2.Run it many times (N = 200 trials)3.Count outcomes (how many land inside the circle?)4.Estimate the quantity you care aboutπ ≈ 4 × 167 / 200 = 3.340True π = 3.14159... (error shrinks as N → ∞)Error ∝ 1/√N — 10,000 trials → ~1% errorNo calculus needed — just random sampling + counting

The classic example is estimating π by throwing darts at a unit square containing a quarter-circle. The ratio of darts landing inside the circle to total darts, multiplied by 4, approximates π. It's not the most efficient way to compute π, but it illustrates the principle perfectly — turn a geometric problem into a counting problem.

The Mathematics

At its core, Monte Carlo estimates an expected value. If we want to compute:

E[f(X)] = ∫ f(x) p(x) dx

where f(x) is some function and p(x) is a probability distribution, the Monte Carlo estimator is:

Ê[f(X)] = (1/N) Σᵢ f(xᵢ), where xᵢ ~ p(x)

Draw N samples from p(x), evaluate f on each, and average. That's it. The Law of Large Numbers guarantees this converges to the true expectation as N → ∞.

The Central Limit Theorem Gives Error Bars

The error of the Monte Carlo estimate follows the Central Limit Theorem. The standard error is σ / √N, where σ is the standard deviation of f(X). This means:

  • 4× more samples → 2× less error (square root relationship)
  • The convergence rate O(1/√N) is independent of dimension
  • This is why MC shines in high dimensions where grid methods scale as O(1/N^(1/d))

Confidence Intervals

Because the CLT applies, we get free confidence intervals:

σ̂² = (1/(N-1)) Σᵢ (f(xᵢ) - Ê)²
95% CI: Ê ± 1.96 × σ̂/√N

In football terms: "Arsenal win the league in 32.4% ± 0.9% of simulations (95% CI)". The ±0.9% tells you how much your answer would change if you ran a different batch of 10,000 simulations.

Convergence

Convergence: More Samples → Better Estimatesπ = 3.14159101001,00010,000Number of samples (N)2.83.03.13.33.5±1/√N envelopeMC estimate

The diagram shows a key property of Monte Carlo: early estimates are volatile, but they stabilise as N grows. The envelope shrinks as 1/√N — halving the error requires 4× more samples.

How Many Samples Do You Need?

A useful rule of thumb: to estimate a probability p within ±ε with 95% confidence, you need roughly N ≈ p(1-p)/ε². For "P(Arsenal win league) ≈ 30%" with ±1% precision: N ≈ 0.3 × 0.7 / 0.01² = 2,100. For ±0.1% precision: 210,000. Rare events (p < 1%) need more samples — this is where variance reduction helps.

The Curse of Rare Events

If you're estimating "P(bottom-placed team wins the league)" — something that happens in maybe 0.01% of simulations — you'd need millions of simulations to get a stable estimate. With 10,000 simulations you might see it 0 or 1 times, giving estimates of 0% or 0.01% purely due to sampling noise. This is where variance reduction techniques become essential.

Variance Reduction

The naive Monte Carlo estimator converges at O(1/√N), but we can do better without increasing N. The idea: reduce the variance σ² of each sample, so the same number of samples gives a tighter estimate.

Variance Reduction TechniquesNaive MCPure random samplingσ = 0.051 (N=1000)Stratified SamplingDivide space, sample each regionσ = 0.031 (N=1000, 40% less)Importance SamplingSample more where it mattershigh-importance regionσ = 0.018 (N=1000, 65% less)Same N, better estimates — smart sampling beats brute force

1. Stratified Sampling

Divide the sample space into strata (regions), then sample from each stratum proportionally. This guarantees coverage of all regions and eliminates "clumping" where random samples might over-represent some areas.

Football example: When simulating match outcomes, stratify by match importance (early season vs. decisive late matches) so every phase of the season is represented in each simulation batch.

2. Importance Sampling

Sample more from regions that contribute most to the quantity you're estimating, then reweight to correct the bias. The estimate becomes:

Ê = (1/N) Σᵢ f(xᵢ) × p(xᵢ) / q(xᵢ), where xᵢ ~ q(x)

where q(x) is a proposal distribution that places more weight on "interesting" regions. The ratio p(x)/q(x) corrects for the biased sampling.

Football example: To estimate the probability of a shock relegation for a top team, bias your simulations toward scenarios where they start poorly, then reweight each simulation by how likely that start was.

3. Antithetic Variates

For every random sample, also use its "mirror image". If you draw U ~ Uniform(0,1), pair it with 1-U. The negative correlation between paired samples reduces variance — when one overestimates, its mirror tends to underestimate.

4. Control Variates

Use a correlated variable with a known expectation to reduce variance. If Z is correlated with your target and E[Z] is known:

Ê_cv = Ê + c × (Z̄ - E[Z])

The correction term c × (Z̄ - E[Z]) exploits the known expectation to cancel out sampling noise.

Football example: When simulating league outcomes, use total goals scored as a control variate (known average ~2.7 per match in the PL). If a batch of simulations happens to generate unusually high-scoring matches, the control variate adjusts the result downward.

Season Simulation (The Killer App)

Monte Carlo Season SimulationMatch ModelP(Home goals) ~ Poisson(λ₁)P(Away goals) ~ Poisson(λ₂)From team strengths + form↓ Sample scorelineSimulate Season380 matches × random scores→ final table, points, GDOne possible season outcome↓ Repeat 10,000×Distribution of OutcomesP(Arsenal win league) = 3,241 / 10,000 = 32.4%P(Leeds relegated) = 2,187 / 10,000 = 21.9%E[Man City points] = 87.3 ± 5.1Full probability distribution, not just a point estimateArsenal Final Points Distribution (10,000 seasons)707580859095100Points

Season simulation is the most widespread application of Monte Carlo in football analytics. FiveThirtyEight, Opta, and every serious analytics department uses some version of this. Here's the pipeline:

Step 1: Match Model

For each of the 380 matches in a Premier League season, model the number of goals scored by each team. The Dixon-Coles model (or variants) is the standard approach:

Home goals ~ Poisson(λ₁)
Away goals ~ Poisson(λ₂)
λ₁ = exp(μ + αᵢ - δⱼ + γ)
λ₂ = exp(μ + αⱼ - δᵢ)

where αᵢ is team i's attack strength, δⱼ is team j's defence strength, μ is the intercept, and γ is home advantage. These parameters are estimated from historical match data.

Step 2: Generate Scorelines

For each match, draw random goals from the Poisson distributions. Match GW1: Arsenal (λ₁=2.1) vs Newcastle (λ₂=0.9) → draw → 3-1. Do this for all 380 matches.

Step 3: Build the Table

From the 380 simulated scorelines, compute points (W=3, D=1, L=0), goal difference, and rank. This gives one possible final league table.

Step 4: Repeat 10,000 Times

Each repetition generates a different season due to different random draws. After 10,000 simulations you have 10,000 final tables, from which you can compute:

Title probabilities

How often does each team finish 1st?

Relegation probabilities

How often does each team finish bottom 3?

Points distributions

Expected points ± uncertainty for each team

Qualification odds

P(top 4), P(Europa League), P(Conference)

Why Not Just Use Expected Points?

Because the league table is a nonlinear function of match results. You can't just average expected goals and expected points — two teams with the same expected points can have very different title/relegation probabilities depending on their variance, remaining schedule, and head-to-head records. Monte Carlo captures all these interactions naturally.

Football Applications

1
Match Result Probabilities

The building block of season simulation. Given team strengths, simulate scorelines to get P(Home win), P(Draw), P(Away win). Compare to bookmaker odds to find value bets. The Poisson model gives match odds; MC gives joint outcomes (both teams to score, over/under, correct score).

2
Expected Goals (xG) Aggregation

A team creates 15 chances with xG values [0.05, 0.12, 0.03, 0.45, ...]. What's the probability they score exactly 2 goals? You can't just sum xG — each shot is independent with its own probability. Monte Carlo: simulate each shot as a Bernoulli trial, count goals, repeat 10,000 times.

3
Tournament Simulation

Champions League knockout stages, World Cup groups + brackets — these have complex dependency structures (who you face in the semis depends on other results). Monte Carlo handles this naturally: simulate every match, advance winners, repeat. Get full probability distributions for every team reaching every round.

4
In-Game Win Probability

At 65 minutes with the score 1-1, what's the probability each team wins? Model remaining goals as a Poisson process with time-adjusted rates (teams score differently when chasing vs. protecting leads). Simulate the remaining 25 minutes thousands of times — instant live win probability, accounting for red cards, substitutions, and game state.

5
Player Value & Contract Modelling

What's a player worth over a 5-year contract? Model uncertain future performance (injury risk, age decline, development trajectory) and market conditions. Each simulation path gives a different career trajectory → different financial value. The distribution of outcomes tells you expected value and risk.

6
Bankroll Management (Kelly Criterion)

Given your edge estimates and their uncertainty, simulate thousands of betting seasons to find optimal stake sizes. The Kelly criterion gives an analytical answer for simple cases, but Monte Carlo handles correlated bets, multiple simultaneous markets, drawdown constraints, and estimation error in your edge — things no formula can capture.

Monte Carlo Markov Chain (MCMC)

Standard Monte Carlo requires you to sample directly from the distribution. But what if you can't? Bayesian posteriors, for example, are often complex high-dimensional distributions with no closed-form sampling method.

MCMC solves this by constructing a Markov chain whose stationary distribution is your target. Instead of independent samples, you take a random walk through parameter space where each step depends only on the current position. After enough steps ("burn-in"), the chain's samples are approximately from the target distribution.

Metropolis-Hastings Algorithm
1. Start at position θ₀
2. Propose θ* ~ q(θ*|θₜ) — e.g. θ* = θₜ + ε, ε ~ Normal(0, σ²)
3. Accept with probability α = min(1, p(θ*)/p(θₜ))
4. If accepted: θₜ₊₁ = θ* — else: θₜ₊₁ = θₜ
5. Repeat for thousands of steps

The key insight: you only need to compute the ratio of probabilities, never the normalisation constant. This makes MCMC practical for Bayesian inference where the posterior P(θ|D) ∝ P(D|θ)P(θ) is known only up to a constant.

MC vs. MCMC
  • Monte Carlo: independent samples from a known distribution → estimate expectations
  • MCMC: correlated samples from an unknown distribution → approximate the distribution itself
  • Season simulation uses MC (sample from Poisson). Bayesian parameter estimation uses MCMC.
  • Both converge to the right answer, but MCMC needs more samples due to autocorrelation

Practical Tips

Vectorise, Don't Loop

Simulating 10,000 seasons with a Python for-loop is slow. Instead, generate all random numbers at once as a matrix (N_sims × N_matches) and use NumPy/PyTorch vectorised operations. A season simulation that takes 10 minutes with loops takes 2 seconds vectorised.

Set Seeds for Reproducibility

Always use np.random.seed(42) or a Generator object so results are reproducible. Different seeds give different results — if your conclusion changes with the seed, you need more samples.

Check Convergence

Run with N=1,000 and N=10,000. If the results change meaningfully, you need more samples. Plot your estimate vs. N (as in the convergence diagram above) — it should stabilise. For MCMC, check trace plots and the Gelman-Rubin R̂ diagnostic (< 1.01).

Model Uncertainty Propagation

Don't just simulate match outcomes — simulate parameter uncertainty too. If your attack strength estimate is α = 0.35 ± 0.12, draw α from its posterior in each simulation. This propagates estimation uncertainty through to the final predictions, giving wider (more honest) confidence intervals.

Watch for Correlated Inputs

In a season simulation, if one team is stronger than expected, their opponents' results are correlated (they all lose to that team). Generate team strengths once per simulation, then condition all match results on those strengths. Don't draw independent match outcomes — that destroys the correlation structure.

When to Use Monte Carlo

✅ Use MC When
  • No closed-form solution exists
  • High-dimensional integrals
  • Complex dependency structures
  • Nonlinear transformations of random variables
  • You need full distributions, not just means
  • The model has conditional logic (if-then paths)
⚡ Prefer Analytical When
  • A closed-form solution exists (use it!)
  • The problem is low-dimensional (< 3D)
  • Numerical integration is tractable
  • Speed is critical (real-time systems)
  • You only need the mean or variance
  • Exact answers are required (not estimates)
The Hybrid Approach

The best implementations use analytical solutions where possible and Monte Carlo for the rest. For example: use the Poisson PMF to compute exact match probabilities (analytical), then use Monte Carlo to propagate those through the league structure (where exact calculation of 380 dependent matches is intractable).

Summary

Key Concepts
  • MC estimates expectations by random sampling + averaging
  • Convergence: error ∝ 1/√N (dimension-independent)
  • Variance reduction: stratified, importance, antithetic, control variates
  • MCMC: sample from distributions you can't sample directly
  • Season simulation: the "killer app" in football analytics
  • Always report confidence intervals with MC estimates
Key Equations
Ê[f(X)] = (1/N) Σᵢ f(xᵢ)
SE = σ̂ / √N
95% CI: Ê ± 1.96 × SE
N ≈ p(1-p) / ε² (sample size for probability)
α = min(1, p(θ*)/p(θₜ)) (Metropolis-Hastings)
Key Takeaway

Monte Carlo is the Swiss Army knife of computational statistics. When the maths gets too hard to solve analytically — when there are too many dimensions, too many dependencies, too many conditional paths — you can almost always fall back to "simulate it and count". In football analytics, this means you never have to settle for point estimates. Every prediction can come with a full probability distribution, uncertainty bounds, and risk assessment. The cost is just compute time and random number generation — both of which are essentially free in 2026.