This model was built for the 2026 Kaggle March Machine Learning Mania competition. The objective: predict the win probability for every possible NCAA tournament matchup (men's and women's) and minimise the Brier score β the mean squared error between predicted probabilities and actual binary outcomes.
Our final model (V14) achieved a combined Brier score of 0.132, using Vegas spread lines as the primary signal with a small Elo-based correction. The journey to get there spanned 15 iterations and taught us a key lesson: market-derived features dominate hand-crafted box-score statistics.
V1 β V6Box-Score Features & Traditional ML
The first wave explored traditional machine learning built on NCAA box-score statistics β field goal percentages, rebounds, turnovers, assist-to-turnover ratios, and tempo. We engineered season-level aggregates for each team, then trained models (logistic regression, gradient boosting) to predict outcomes.
Results were middling. The models captured broad seed-tier effects (1-seeds beat 16-seeds) but struggled with the nuanced mid-bracket matchups that separate good submissions from great ones. V6 showed signs of overfitting with a per-gender tuning approach that didn't generalise out of sample.
V7 β V11Elo Ratings & Seed Priors
The second phase introduced two interpretable features that dramatically improved performance:
- β’ K-factor of 20
- β’ Home-court advantage: +3.5 points
- β’ Margin multiplier: log(|m|+1) Γ 2.2 / (|Ξelo|Γ0.006 + 2.2)
- β’ Season mean reversion: 65% toward 1500
- β’ Regular-season cutoff at day 133
- β’ Historical win rates per seed-vs-seed pair
- β’ Minimum 5 games to use historical rate
- β’ Linear fallback: +3% per seed difference
- β’ e.g. 1-vs-8 seeds: ~80% historical win rate
V11 combined these via blending (Elo, seeds, and an earlier submission as a regularisation anchor), achieving meaningful improvement. However, Brier scores plateaued around 0.14β0.15.
V14The Breakthrough: Vegas Spreads
The breakthrough came from a simple insight: Vegas spread lines encode an enormous amount of information. Oddsmakers aggregate injury reports, travel schedules, matchup-specific tendencies, motivation, and public betting patterns β information that no box-score model can capture from summary statistics alone.
Convert each game's point spread to a win probability using a normal CDF:
Ο = 11.5 was calibrated via leave-one-season-out cross-validation across historical tournament odds data.
Where Elo disagrees with the spread-implied probability, apply a small correction:
Leave-one-year-out validation showed Ξ± = 0.05β0.10 was optimal for men's games β a very light touch. Larger Ξ± values degraded performance, confirming the spread already captures most of what Elo measures.
Women's tournament spreads are less efficient (thinner betting markets). For women's games, we blended 50/50 with a prior submission as regularisation.
When no spread was available, we fell back to: 40% Elo + 30% seed priors + 30% V1 baseline. All probabilities clipped to [0.005, 0.995] to avoid catastrophic Brier penalties on upsets.
The choice of Ο = 11.5 was determined via grid search across Ο β [8.0, 20.0] for each historical tournament year:
| Season | Optimal Ο | Brier Score |
|---|---|---|
| 2021 | 11.0 | 0.118 |
| 2022 | 12.0 | 0.137 |
| 2023 | 11.5 | 0.126 |
| 2024 | 11.5 | 0.131 |
Ο = 11.5 was consistently near-optimal and was selected as the fixed parameter.
| Variant | Description |
|---|---|
| pure_spread | Spread only, no correction |
| elo_005 | Spread + 5% Elo nudge |
| elo_010 | Spread + 10% Elo nudge |
| elo_015 | Spread + 15% Elo nudge |
| seed_90 | 90% spread + 10% seed matchup prior |
After V14's success, we attempted a more sophisticated approach: a full machine learning pipeline using margin regression with Optuna hyper-parameter optimisation, leave-one-season-out cross-validation, isotonic calibration, and vectorised feature computation for speed.
Despite running for ~12 minutes and using far more engineered features, V15 scored 0.166 β significantly worse than V14's 0.132. The lesson was clear: in a domain where efficient markets exist, the market price is an extraordinarily strong baseline that hand-crafted features cannot beat without genuinely novel information.
Vegas spreads compress a vast amount of private and public information into a single number. Any model that ignores this signal is leaving easy accuracy on the table.
A small Elo correction (Ξ± β 0.05) can improve on pure spread predictions, suggesting the market isn't perfectly efficient β but the edge is thin.
V15's gradient-boosted, cross-validated, Optuna-tuned pipeline was objectively more sophisticated than V14's 100-line spread converter β and objectively worse.
In Brier-scored competitions, a well-calibrated simple model outperforms a poorly-calibrated complex one. The normal CDF transform provides naturally well-calibrated probabilities.
Blending with prior estimates helped for women's games, suggesting less liquid betting markets leave more room for statistical models to contribute.