Most trajectory models in football tackle deterministic sub-problems: predict where a defender will be 4 seconds from now, or impute a missing player's position. But open-play tactics are inherently non-deterministic — there are many plausible ways a possession could unfold, and tactical analysis cares about the distribution of plausible futures, not a single average.
GenTac reframes the task as generation, not prediction: given a short context window of 22 player + ball trajectories, sample multiple realistic continuations of open play that obey football's tactical regularities. Crucially, the user can steer the generation with conditioning signals — e.g. "simulate how Manchester City would attack against a Burnley low block".
The paper argues that prior trajectory work — TranSPORTmer, Diffoot, CausalTraj, SportsNGEN — solves pieces of the problem, but none of them give analysts a single controllable knob over tactical style:
Point-prediction models (Social-LSTM, TranSPORTmer's forecasting head) collapse a multi-modal distribution into one trajectory. Tactical analysis needs "what could happen", not just "what is most likely".
Existing diffusion approaches (Diffoot) generate diverse samples but are conditioned only on the observed past — there's no way to ask the model for a specific style, opponent, or objective.
There is no shared benchmark that measures whether a generated possession is actually tactically meaningful — recognisable as a counter-attack, a build-up, a press-trigger, etc.
Build a single conditional diffusion backbone that supports five conditioning modes over the same generative process — and ship a paired benchmark (TacBench) that scores both trajectory realism and downstream tactical-event recognisability.
The denoiser predicts the joint trajectory of 22 players + 1 ball over a short horizon, using a causal sliding window of 0.2s hops. Each forward pass denoises a chunk of trajectories from Gaussian noise back to a tactically coherent rollout, conditioned on:
Set-attention over the 23 agents at each frame, plus a temporal transformer along the time axis. Permutation-invariant in the player dimension; ball is treated as a special token with its own embedding.
- • 0.2s causal window (5 frames at 25 FPS)
- • v-prediction loss (stable, well-known to work)
- • DDIM sampling at inference
- • K = 20 samples per context for evaluation
The same backbone is trained with classifier-free guidance over a multiplexed conditioning vector, so a single checkpoint serves five inference regimes:
- • Unconditioned — pure prior over open play
- • Opponent-conditioned — given opposing team ID
- • Team-conditioned — generate in a chosen team's style
- • League-conditioned — EPL vs. La Liga vs. Bundesliga, etc.
- • Objective-conditioned — e.g. counter-attack, sustained build-up
An unconditioned model is a prior. An opponent-conditioned model is a scout report. A team-conditioned model is a style transfer. A league-conditioned model captures tactical culture. An objective-conditioned model is closest to what coaches actually want — "show me what a press-bypass through midfield looks like for this side". GenTac is the first paper to put all five behind a unified API.
Generative trajectory models have historically been evaluated with ADE/FDE — average and final displacement error against ground truth. But ADE/FDE penalises a sample for being different from the one rollout that actually happened, even when it's perfectly plausible. TacBench fixes this by adding a tactical-recognisability task on top.
2,838 open-play segments with held-out continuations. Models generate K = 20 samples per context; metrics include best-of-K ADE/FDE plus diversity / coverage scores.
- • Trajectory realism (kinematic plausibility)
- • Min-of-K ADE/FDE against ground truth
- • Sample diversity within each context
423 segments labelled with 5 tactical event types covering 15 sub-types (e.g. counter-attack, half-space combination, switch of play, high press trigger). Generated rollouts are scored on whether a downstream classifier still recognises the intended tactic.
- • 5 high-level event types
- • 15 fine-grained sub-types
- • Top-1 type / sub-type accuracy on samples
Best-of-K ADE/FDE on TacBench-Trajectory beats Diffoot, TranSPORTmer and Social-STGCNN baselines, particularly in the longer-horizon regime where multi-modality matters most.
71.2% top-1 accuracy on tactical event type and 53.7% on the harder sub-type task — generated rollouts remain recognisable as the intended tactic to a downstream classifier.
Team-, league- and objective-conditioned samples shift summary statistics (possession shape, defensive line height, vertical progression rate) in directions consistent with the requested style.
One of the more striking results: because the denoiser is permutation-invariant over agents and only mildly assumes pitch geometry, GenTac fine-tunes onto other team-sport tracking data with minimal changes. The paper reports plausible generations on:
10 agents + ball, half-court tracking. Generated possessions reproduce pick-and-roll geometry and motion-offense spacing.
22 agents on a longer field. Captures route concepts and coverage shells from snap to first read.
12 agents + puck, faster cycle times. Conditional samples respect zone-entry and forecheck structure.
Cross-sport portability is a soft argument that GenTac has learned something about multi-agent coordination under spatial constraints, not just memorised the geometry of one league's pitch. That is precisely the kind of representation a foundation model for team-sport tracking should learn.
| Aspect | TranSPORTmer | Diffoot | CausalTraj | GenTac |
|---|---|---|---|---|
| Output Type | Point | Multi-modal samples | Joint multi-modal | Joint multi-modal + controllable |
| Agents Modelled | 22 + ball | Defenders only (11) | 22 + ball | 22 + ball |
| Conditioning | Past trajectories | Past + graph | Past + causal structure | 5 modes (team / opp / league / obj.) |
| Evaluation | ADE/FDE + impute/classify | ADE/FDE + direction | ADE/FDE + coherence | TacBench (trajectory + event) |
| Cross-Sport | Football / basketball | Football only | Football only | Football, basketball, NFL, hockey |
| Best Use Case | Real-time multi-task | Defensive scouting | Coherent rollouts | Tactical scenario design, "what-if" analysis |
The 0.2s causal window keeps generation tractable but means GenTac is really a stitched short-rollout model rather than a true minute-long tactical simulator. Long horizons compound sampling noise.
53.7% sub-type accuracy is impressive but leaves a lot on the table — fine-grained tactical concepts (e.g. half-space underlap vs. third-man combination) are still partially out of reach.
Multi-step DDIM sampling × K = 20 samples is meaningfully slower than a single-shot point predictor. Fine for off-line scouting; tight for real-time use.
"Team identity" and "objective" are coarse handles. Coaches typically think in terms of structures and triggers; bridging that to a useful conditioning vocabulary is open work.
• DDPM (Ho et al., 2020) — Denoising Diffusion Probabilistic Models
• DDIM (Song et al., 2021) — Faster deterministic sampling
• Classifier-Free Guidance (Ho & Salimans, 2022) — Conditioning trick used by GenTac
• Diffoot (2025) — Graph-conditioned diffusion for defensive trajectories
• CausalTraj — Coherent multi-agent forecasting via temporal causality
• TranSPORTmer — Set-attention unified trajectory model
Diffoot showed diffusion works for football trajectories. CausalTraj showed how to keep joint rollouts coherent. TranSPORTmer showed one architecture can do many tasks. GenTac is the natural next step: a single conditional generative model over the whole 22-vs-22 system, with a benchmark that explicitly grades whether the samples are tactically meaningful.