Probaballer - Football Analytics & Betting Insights

Research Deep DivearXiv 2026

GenTac: Generative Tactics for Open-Play Football

A diffusion-based framework that generates realistic, controllable open-play football tactics — conditioning multi-agent trajectories on team identity, opponent, league style, and high-level tactical objectives.

Diffusion ModelsMulti-Agent GenerationConditional SamplingTacBench Benchmark40 min read

Authors: Jiayuan Rao et al. — Shanghai Jiao Tong University (2026)

Read on arXiv View Code on GitHub

The Tactical Problem

Most trajectory models in football tackle deterministic sub-problems: predict where a defender will be 4 seconds from now, or impute a missing player's position. But open-play tactics are inherently non-deterministic — there are many plausible ways a possession could unfold, and tactical analysis cares about the distribution of plausible futures, not a single average.

GenTac reframes the task as generation, not prediction: given a short context window of 22 player + ball trajectories, sample multiple realistic continuations of open play that obey football's tactical regularities. Crucially, the user can steer the generation with conditioning signals — e.g. "simulate how Manchester City would attack against a Burnley low block".

Why Existing Methods Fall Short

The paper argues that prior trajectory work — TranSPORTmer, Diffoot, CausalTraj, SportsNGEN — solves pieces of the problem, but none of them give analysts a single controllable knob over tactical style:

❌ Single-Mode Outputs

Point-prediction models (Social-LSTM, TranSPORTmer's forecasting head) collapse a multi-modal distribution into one trajectory. Tactical analysis needs "what could happen", not just "what is most likely".

❌ No Tactical Control

Existing diffusion approaches (Diffoot) generate diverse samples but are conditioned only on the observed past — there's no way to ask the model for a specific style, opponent, or objective.

❌ No Standard Evaluation

There is no shared benchmark that measures whether a generated possession is actually tactically meaningful — recognisable as a counter-attack, a build-up, a press-trigger, etc.

GenTac's Solution

Build a single conditional diffusion backbone that supports five conditioning modes over the same generative process — and ship a paired benchmark (TacBench) that scores both trajectory realism and downstream tactical-event recognisability.

GenTac Architecture: Conditional Diffusion over 23 Agents

Causal sliding window + spatiotemporal denoiser + multi-modal conditioning

The denoiser predicts the joint trajectory of 22 players + 1 ball over a short horizon, using a causal sliding window of 0.2s hops. Each forward pass denoises a chunk of trajectories from Gaussian noise back to a tactically coherent rollout, conditioned on:

Backbone: Spatiotemporal Denoiser

Set-attention over the 23 agents at each frame, plus a temporal transformer along the time axis. Permutation-invariant in the player dimension; ball is treated as a special token with its own embedding.

• 0.2s causal window (5 frames at 25 FPS)
• v-prediction loss (stable, well-known to work)
• DDIM sampling at inference
• K = 20 samples per context for evaluation

Conditioning: Five Modes, One Model

The same backbone is trained with classifier-free guidance over a multiplexed conditioning vector, so a single checkpoint serves five inference regimes:

• Unconditioned — pure prior over open play
• Opponent-conditioned — given opposing team ID
• Team-conditioned — generate in a chosen team's style
• League-conditioned — EPL vs. La Liga vs. Bundesliga, etc.
• Objective-conditioned — e.g. counter-attack, sustained build-up

Why Five Modes Matter

An unconditioned model is a prior. An opponent-conditioned model is a scout report. A team-conditioned model is a style transfer. A league-conditioned model captures tactical culture. An objective-conditioned model is closest to what coaches actually want — "show me what a press-bypass through midfield looks like for this side". GenTac is the first paper to put all five behind a unified API.

TacBench: A Benchmark for Tactical Generation

Two paired tasks — trajectory forecasting and tactical-event recognition

Generative trajectory models have historically been evaluated with ADE/FDE — average and final displacement error against ground truth. But ADE/FDE penalises a sample for being different from the one rollout that actually happened, even when it's perfectly plausible. TacBench fixes this by adding a tactical-recognisability task on top.

TacBench-Trajectory

2,838 open-play segments with held-out continuations. Models generate K = 20 samples per context; metrics include best-of-K ADE/FDE plus diversity / coverage scores.

• Trajectory realism (kinematic plausibility)
• Min-of-K ADE/FDE against ground truth
• Sample diversity within each context

TacBench-Event

423 segments labelled with 5 tactical event types covering 15 sub-types (e.g. counter-attack, half-space combination, switch of play, high press trigger). Generated rollouts are scored on whether a downstream classifier still recognises the intended tactic.

• 5 high-level event types
• 15 fine-grained sub-types
• Top-1 type / sub-type accuracy on samples

Headline Results

Trajectory Forecasting

Best-of-K ADE/FDE on TacBench-Trajectory beats Diffoot, TranSPORTmer and Social-STGCNN baselines, particularly in the longer-horizon regime where multi-modality matters most.

Event Recognition

71.2% top-1 accuracy on tactical event type and 53.7% on the harder sub-type task — generated rollouts remain recognisable as the intended tactic to a downstream classifier.

Conditioning Works

Team-, league- and objective-conditioned samples shift summary statistics (possession shape, defensive line height, vertical progression rate) in directions consistent with the requested style.

Cross-Sport Generalisation

The same backbone runs on basketball, American football and ice hockey

One of the more striking results: because the denoiser is permutation-invariant over agents and only mildly assumes pitch geometry, GenTac fine-tunes onto other team-sport tracking data with minimal changes. The paper reports plausible generations on:

🏀 Basketball

10 agents + ball, half-court tracking. Generated possessions reproduce pick-and-roll geometry and motion-offense spacing.

🏈 American Football

22 agents on a longer field. Captures route concepts and coverage shells from snap to first read.

🏒 Ice Hockey

12 agents + puck, faster cycle times. Conditional samples respect zone-entry and forecheck structure.

Why This Matters

Cross-sport portability is a soft argument that GenTac has learned something about multi-agent coordination under spatial constraints, not just memorised the geometry of one league's pitch. That is precisely the kind of representation a foundation model for team-sport tracking should learn.

GenTac vs. Diffoot vs. TranSPORTmer vs. CausalTraj

Aspect	TranSPORTmer	Diffoot	CausalTraj	GenTac
Output Type	Point	Multi-modal samples	Joint multi-modal	Joint multi-modal + controllable
Agents Modelled	22 + ball	Defenders only (11)	22 + ball	22 + ball
Conditioning	Past trajectories	Past + graph	Past + causal structure	5 modes (team / opp / league / obj.)
Evaluation	ADE/FDE + impute/classify	ADE/FDE + direction	ADE/FDE + coherence	TacBench (trajectory + event)
Cross-Sport	Football / basketball	Football only	Football only	Football, basketball, NFL, hockey
Best Use Case	Real-time multi-task	Defensive scouting	Coherent rollouts	Tactical scenario design, "what-if" analysis

Limitations & Open Questions

1. Short Horizon

The 0.2s causal window keeps generation tractable but means GenTac is really a stitched short-rollout model rather than a true minute-long tactical simulator. Long horizons compound sampling noise.

2. Event Sub-types Are Hard

53.7% sub-type accuracy is impressive but leaves a lot on the table — fine-grained tactical concepts (e.g. half-space underlap vs. third-man combination) are still partially out of reach.

3. Inference Cost

Multi-step DDIM sampling × K = 20 samples is meaningfully slower than a single-shot point predictor. Fine for off-line scouting; tight for real-time use.

4. Conditioning Granularity

"Team identity" and "objective" are coarse handles. Coaches typically think in terms of structures and triggers; bridging that to a useful conditioning vocabulary is open work.

Resources & Further Reading

Read the GenTac Paper

arXiv 2604.11786

Official Code Repository

GitHub — jyrao/GenTac