Probaballer - Football Analytics & Betting Insights

March Madness: Spreads Beat ML

How 15 iterations taught us that Vegas spread lines crush hand-crafted features — and what we learned about knowing when to stop engineering.

Overview

This model was built for the 2026 Kaggle March Machine Learning Mania competition. The objective: predict the win probability for every possible NCAA tournament matchup (men's and women's) and minimise the Brier score — the mean squared error between predicted probabilities and actual binary outcomes.

Our final model (V14) achieved a combined Brier score of 0.132, using Vegas spread lines as the primary signal with a small Elo-based correction. The journey to get there spanned 15 iterations and taught us a key lesson: market-derived features dominate hand-crafted box-score statistics.

The Iteration Journey

15 versions, 3 distinct phases, 1 clear winner

V1 – V6Box-Score Features & Traditional ML

The first wave explored traditional machine learning built on NCAA box-score statistics — field goal percentages, rebounds, turnovers, assist-to-turnover ratios, and tempo. We engineered season-level aggregates for each team, then trained models (logistic regression, gradient boosting) to predict outcomes.

Results were middling. The models captured broad seed-tier effects (1-seeds beat 16-seeds) but struggled with the nuanced mid-bracket matchups that separate good submissions from great ones. V6 showed signs of overfitting with a per-gender tuning approach that didn't generalise out of sample.

V7 – V11Elo Ratings & Seed Priors

The second phase introduced two interpretable features that dramatically improved performance:

Custom Elo System

• K-factor of 20
• Home-court advantage: +3.5 points
• Margin multiplier: log(|m|+1) × 2.2 / (|Δelo|×0.006 + 2.2)
• Season mean reversion: 65% toward 1500
• Regular-season cutoff at day 133

Seed Matchup Priors

• Historical win rates per seed-vs-seed pair
• Minimum 5 games to use historical rate
• Linear fallback: +3% per seed difference
• e.g. 1-vs-8 seeds: ~80% historical win rate

V11 combined these via blending (Elo, seeds, and an earlier submission as a regularisation anchor), achieving meaningful improvement. However, Brier scores plateaued around 0.14–0.15.

V14The Breakthrough: Vegas Spreads

The breakthrough came from a simple insight: Vegas spread lines encode an enormous amount of information. Oddsmakers aggregate injury reports, travel schedules, matchup-specific tendencies, motivation, and public betting patterns — information that no box-score model can capture from summary statistics alone.

V14 Architecture

Spread-first with an Elo nudge

1. Spread → Probability Conversion

Convert each game's point spread to a win probability using a normal CDF:

P(lower_seed_wins) = Φ(−spread / σ), σ = 11.5

σ = 11.5 was calibrated via leave-one-season-out cross-validation across historical tournament odds data.

2. Elo Disagreement Nudge

Where Elo disagrees with the spread-implied probability, apply a small correction:

P_final = P_spread + α × (P_elo − P_spread)

Leave-one-year-out validation showed α = 0.05–0.10 was optimal for men's games — a very light touch. Larger α values degraded performance, confirming the spread already captures most of what Elo measures.

3. Women's Blending

Women's tournament spreads are less efficient (thinner betting markets). For women's games, we blended 50/50 with a prior submission as regularisation.

4. No-Spread Fallback

When no spread was available, we fell back to: 40% Elo + 30% seed priors + 30% V1 baseline. All probabilities clipped to [0.005, 0.995] to avoid catastrophic Brier penalties on upsets.

Sigma Calibration

Data-driven choice of the spread-to-probability parameter

The choice of σ = 11.5 was determined via grid search across σ ∈ [8.0, 20.0] for each historical tournament year:

Season	Optimal σ	Brier Score
2021	11.0	0.118
2022	12.0	0.137
2023	11.5	0.126
2024	11.5	0.131

σ = 11.5 was consistently near-optimal and was selected as the fixed parameter.

Submission Variants

Variant	Description
pure_spread	Spread only, no correction
elo_005	Spread + 5% Elo nudge
elo_010	Spread + 10% Elo nudge
elo_015	Spread + 15% Elo nudge
seed_90	90% spread + 10% seed matchup prior

V15: The ML Detour

Why more complexity made things worse

After V14's success, we attempted a more sophisticated approach: a full machine learning pipeline using margin regression with Optuna hyper-parameter optimisation, leave-one-season-out cross-validation, isotonic calibration, and vectorised feature computation for speed.

0.132

V14 Brier Score

~100 lines of code

0.166

V15 Brier Score

~12 min runtime, 400+ lines

Despite running for ~12 minutes and using far more engineered features, V15 scored 0.166 — significantly worse than V14's 0.132. The lesson was clear: in a domain where efficient markets exist, the market price is an extraordinarily strong baseline that hand-crafted features cannot beat without genuinely novel information.

Key Takeaways

Markets are hard to beat

Vegas spreads compress a vast amount of private and public information into a single number. Any model that ignores this signal is leaving easy accuracy on the table.

Elo adds marginal value

A small Elo correction (α ≈ 0.05) can improve on pure spread predictions, suggesting the market isn't perfectly efficient — but the edge is thin.

Complexity ≠ accuracy

V15's gradient-boosted, cross-validated, Optuna-tuned pipeline was objectively more sophisticated than V14's 100-line spread converter — and objectively worse.

Calibration matters more than discrimination

In Brier-scored competitions, a well-calibrated simple model outperforms a poorly-calibrated complex one. The normal CDF transform provides naturally well-calibrated probabilities.

Women's markets are less efficient

Blending with prior estimates helped for women's games, suggesting less liquid betting markets leave more room for statistical models to contribute.