💷📊
Kalman Filters
How to estimate truth from noisy measurements — the maths behind smooth player tracking, velocity estimation, and sensor fusion in football analytics.
State EstimationSignal Processing30 min read
Why Kalman Filters?

Every piece of tracking data you've ever worked with is wrong. Not by a lot — but GPS chips drift, optical systems lose players behind each other, and even the best sensors add noise. When SkillCorner or Metrica reports a player at (34.2, 18.7), the real position might be (34.5, 18.4). The question is: how do you get the best possible estimate of where a player actually is?

The Problem: Every Sensor Lies

GPS has ±0.5–2m error. Optical tracking loses players during occlusions. Broadcast-derived systems have variable framerate and perspective distortion. If you naively use raw sensor readings, you get jittery trajectories, impossible accelerations, and velocity estimates that spike wildly between frames.

The Insight: You Know More Than the Sensor Does

Here's the key idea. A sensor tells you "the player is at (34.2, 18.7)." But you also know physics: the player was at (33.8, 18.9) last frame, moving at 5 m/s northeast. They can't teleport. Combining what you predict from physics with what you observe from the sensor gives a better answer than either alone.

This is exactly what a Kalman filter does. Invented by Rudolf Kálmán in 1960, it's an algorithm that optimally fuses a model prediction ("where should the player be based on physics?") with a noisy measurement ("where does the sensor say they are?"), weighting each by how uncertain it is.

Used Everywhere
  • • Apollo moon navigation
  • • GPS receivers
  • • Self-driving cars
  • • Drone flight controllers
  • • Player tracking systems
What It Gives You
  • • Smooth trajectories
  • • Velocity estimates for free
  • • Gap interpolation
  • • Uncertainty bounds
  • • Sensor fusion
What It Needs
  • • A motion model (simple physics)
  • • Sensor noise estimate
  • • Process noise estimate
  • • Initial state guess
  • • Linear or linearisable system
The Core Idea: Predict, Then Correct
Two steps, repeated forever

The Kalman filter operates in a loop with exactly two steps. First, you predict where you think the state will be next, using a model (usually simple physics). Then, when a sensor measurement arrives, you correct your prediction by blending it with the measurement. The blend ratio is called the Kalman gain.

The Kalman Filter: Predict–Update CycleInitialStatex₀, P₀PREDICTx̂⁻ = F·x̂ + B·uP⁻ = F·P·Fᵀ + Q"Where do I think it is?"UPDATEK = P⁻Hᵀ(HP⁻Hᵀ+R)⁻¹x̂ = x̂⁻ + K(z − Hx̂⁻)P = (I − KH)P⁻"Correct with measurement"zₖSensorBestEstimateFeed estimate back → next predictionModel-based predictionSensor-corrected update
Football Analogy

Imagine you're a commentator tracking a striker. Between camera cuts, you predict where they are: "They were sprinting towards the box at 8 m/s, so they're probably near the penalty spot now." When the camera cuts back, you correct: "Ah, they actually slowed down — they're a few metres behind where I expected." Your brain naturally does predict-update. The Kalman filter does it mathematically optimally.

State Space Models: The Foundation
Separating what's real from what you can see

Before we can filter anything, we need to formalise our problem. Kalman filters operate on state space models — a way of separating the true hidden state (the player's actual position and velocity) from the noisy observations (what the sensor reports).

State Space Model: Hidden State Drives ObservationsHiddenStatex0Fx1Fx2Fx3Fx4ObservedMeas.z0Hz1Hz2Hz3Hz4HF = state transition | H = observation mapping | x = true state | z = noisy measurement

The Two Equations

State transition (how the world evolves):
xₖ = F · xₖ₋₁ + B · uₖ + wₖ
Observation (what the sensor sees):
zₖ = H · xₖ + vₖ
xₖ: True state at time k (what we want to know — position, velocity)
F: State transition matrix (physics model — "how does state evolve?")
B: Control input matrix (optional external forces)
uₖ: Control input (e.g., acceleration command)
wₖ: Process noise ~ N(0, Q) — unpredictable state changes
zₖ: Measurement (what the sensor reports)
H: Observation matrix (maps state to measurement space)
vₖ: Measurement noise ~ N(0, R) — sensor inaccuracy

Concrete Example: Tracking a Player

For a footballer moving on a 2D pitch, a common state vector tracks position and velocity:

State vector (what we estimate):
x = [x, y, vₓ, vᵧ]ᵀ
position (x, y) in metres + velocity (vₓ, vᵧ) in m/s
State transition (constant velocity model, dt = 1/25 s):
F = [1 0 dt 0 ]
[0 1 0 dt]
[0 0 1 0 ]
[0 0 0 1 ]
x_new = x_old + vₓ·dt, y_new = y_old + vᵧ·dt, velocities unchanged
Observation matrix (sensor only sees position, not velocity):
H = [1 0 0 0]
[0 1 0 0]
GPS gives (x, y) but not (vₓ, vᵧ) — the filter infers velocity!
Key Insight: Free Velocity

Notice the sensor only reports position, but the state includes velocity. The Kalman filter infers velocity from position changes, giving you smooth speed/acceleration estimates for free. This is far better than naive finite differences (vₓ ≈ Δx/Δt) which amplify noise.

Step 1: Predict
Project the state forward using physics

The predict step uses your motion model to project the current state estimate forward in time. No sensor data is used — this is pure physics. You also need to project the uncertainty forward, because predictions become less certain over time.

Predicted state estimate:
x̂ₖ⁻ = F · x̂ₖ₋₁ + B · uₖ
"Take last best estimate, apply physics to get predicted next state"
Predicted error covariance:
Pₖ⁻ = F · Pₖ₋₁ · Fᵀ + Q
"Uncertainty grows: old uncertainty transformed by physics, plus process noise Q"
x̂ₖ⁻Predicted state (the ⁻ means "before seeing measurement")
Pₖ⁻Predicted covariance (how uncertain we are about the prediction)
QProcess noise — models unpredictable accelerations, direction changes
FState transition matrix — encodes your physics model
Football Example

A midfielder is at (40.0, 25.0) moving at (3.0, 1.5) m/s. At 25 Hz (dt = 0.04s), the prediction step says: "Next frame they should be at (40.12, 25.06)." The Q matrix accounts for the fact that footballers don't move at constant velocity — they accelerate, decelerate, and change direction. A higher Q means "I expect the player to be unpredictable" (e.g., during a dribble).

Step 2: Update (Correct)
Blend prediction with measurement

When a sensor measurement zₖ arrives, we correct our prediction. The magic is in the Kalman gain K — it automatically decides how much to trust the prediction vs. the measurement, based on their relative uncertainties.

1. Innovation (prediction error — how wrong were we?):
ỹₖ = zₖ − H · x̂ₖ⁻
"Sensor says z, I predicted Hx̂⁻ — the difference is the surprise"
2. Innovation covariance:
Sₖ = H · Pₖ⁻ · Hᵀ + R
"Total uncertainty in the innovation: prediction uncertainty + sensor noise"
3. Kalman gain (the magic ratio):
Kₖ = Pₖ⁻ · Hᵀ · Sₖ⁻¹
"How much should I adjust? High if prediction is uncertain, low if sensor is noisy"
4. Updated state estimate:
x̂ₖ = x̂ₖ⁻ + Kₖ · ỹₖ
"Take prediction, add Kalman gain × surprise = best estimate"
5. Updated error covariance:
Pₖ = (I − Kₖ · H) · Pₖ⁻
"Uncertainty shrinks — we're more confident after incorporating the measurement"

Understanding the Kalman Gain

The Kalman gain K is the heart of the filter. It's a number between 0 and 1 (for scalar systems) that answers: "how much should I trust the sensor vs. my prediction?"

Kalman Gain: Who Do You Trust More?K ≈ 0 (Trust Model)Sensor is very noisyx̂⁻z→ Estimate ≈ predictionSmall circles = low uncertaintyK ≈ 0.5 (Balanced)Similar uncertaintyx̂⁻z→ Estimate splits the diffEqual trust in both sourcesK ≈ 1 (Trust Sensor)Model is very uncertainx̂⁻z→ Estimate ≈ measurementDashed circles = high uncertainty
K → 0 (Trust Model)

Sensor is very noisy (R is large) relative to prediction uncertainty (P⁻ is small). The filter mostly ignores the measurement.

Example: GPS glitch reads player at 80m when they're near centre — filter says "nah, I'll stick with my prediction"
K → 1 (Trust Sensor)

Prediction is very uncertain (P⁻ is large) relative to sensor noise (R is small). The filter jumps to the measurement.

Example: Player has been occluded for 2 seconds (P⁻ grew huge), then reappears — filter snaps to sensor
This Is Provably Optimal

For linear systems with Gaussian noise, the Kalman filter gives the minimum variance unbiased estimate. No other linear filter can do better. This is why it's been used for everything from Moon landings to your phone's GPS.

Putting It All Together
The complete algorithm
Kalman Filter Algorithm
INITSet initial state x̂₀ and covariance P₀
For each new frame k = 1, 2, 3, ...
P1Predict state: x̂ₖ⁻ = F · x̂ₖ₋₁ + B · uₖ
P2Predict covariance: Pₖ⁻ = F · Pₖ₋₁ · Fᵀ + Q
U1Kalman gain: Kₖ = Pₖ⁻ · Hᵀ · (H · Pₖ⁻ · Hᵀ + R)⁻¹
U2Update state: x̂ₖ = x̂ₖ⁻ + Kₖ · (zₖ − H · x̂ₖ⁻)
U3Update covariance: Pₖ = (I − Kₖ · H) · Pₖ⁻
Repeat. Each iteration takes microseconds.

That's it. Five equations, repeated every frame. The elegance is that all the complexity — sensor noise, motion uncertainty, variable trust — is captured in these matrix operations. The filter automatically adapts: when the sensor is reliable, K is high and it tracks measurements closely. When the sensor is noisy or missing, K drops and the filter coasts on its physics model.

Player Tracking: Raw GPS vs Kalman-FilteredPitch X (metres)Y (m)True pathRaw GPS (noisy)Kalman-filtered (smooth)
Tuning: The Q and R Matrices
The art behind the science

The Kalman filter is optimal if Q and R are correct. In practice, you rarely know the exact noise values. Tuning these is where the engineering judgement comes in.

R — Measurement Noise

How much you distrust the sensor.

  • Large R: Smoother output, slower to react — filter "ignores" jitter
  • Small R: Responsive but jittery — trusts every reading
For GPS tracking: R ≈ diag(0.25, 0.25) (±0.5m std)
For optical: R ≈ diag(0.01, 0.01) (±0.1m std)
Q — Process Noise

How much the player deviates from constant velocity.

  • Large Q: Expects lots of acceleration — more responsive to changes
  • Small Q: Assumes smooth motion — over-smooths sudden turns
Typical: model Q from expected max acceleration (~4 m/s² for sprinting)
Some systems use adaptive Q that increases during high-acceleration phases
The Q/R Trade-off

The ratio Q/R is what really matters. High Q/R = trust sensor more, track aggressively. Low Q/R = trust model more, smooth aggressively. For football tracking, you typically want moderate smoothing — enough to remove GPS jitter, but responsive enough to catch a sudden change of direction.

Beyond Linear: The Extended Kalman Filter
Handling nonlinear motion models

The standard Kalman filter assumes everything is linear: state transitions, observations, Gaussian noise. But real football motion is nonlinear — players turn, accelerate in curves, and the relationship between raw camera coordinates and pitch coordinates involves perspective transforms.

The EKF Trick: Linearise Locally

The Extended Kalman Filter (EKF) handles nonlinear systems by linearising the model at each time step using a first-order Taylor expansion (i.e., the Jacobian). Instead of fixed matrices F and H, you compute their Jacobians at the current state estimate.

Nonlinear state transition:
xₖ = f(xₖ₋₁, uₖ) + wₖ
EKF predict (linearise f around x̂ₖ₋₁):
x̂ₖ⁻ = f(x̂ₖ₋₁, uₖ)
Fₖ = ∂f/∂x |ₓ̂ₖ₋₁ (Jacobian of f)
Then proceed with standard Kalman equations using Fₖ instead of fixed F.

Other Variants

UKF

Unscented Kalman Filter. Uses sigma points instead of Jacobians — better for highly nonlinear systems. No derivatives needed.

Particle Filter

Uses Monte Carlo sampling. Handles arbitrary distributions (not just Gaussian). Computationally expensive but very flexible.

Kalman Smoother

Runs forward and backward through data. Uses future information too. Perfect for offline processing of tracking data.

Which to Use for Football?

For most football tracking, the standard Kalman filter with a constant-velocity model works surprisingly well. Players move roughly linearly between 25 Hz frames (0.04s). Use EKF if you're fusing sensors with nonlinear observation models (e.g., raw camera pixel coordinates → pitch coordinates). Use the Kalman smoother for offline analysis when you have the complete match data.

Football Applications
Kalman filters on the pitch

Kalman filters are everywhere in football analytics infrastructure — you're almost certainly using Kalman-filtered data even if you don't realise it:

Kalman Filter in a Football Tracking PipelineRaw SensorGPS / Optical25 Hz, noisygaps, jitter± 0.5m errorKalmanFilterState: [x, y, vx, vy]Predict → Updateper player, per frameClean TracksSmooth positionsVelocity estimatesGap interpolationUncertainty boundsDownstreamxG modelsSTGNNsFoundation modelsSkillCorner, Metrica, DFLWhat you actually train on
1. Player Position Smoothing

The most direct application. Run one Kalman filter per player (and one for the ball). State = [x, y, vₓ, vᵧ]. The filter removes GPS jitter, interpolates through short dropouts, and provides smooth trajectories suitable for downstream models.

State: [x, y, vₓ, vᵧ] | Measurement: GPS (x, y) | Model: Constant velocity | Result: Smooth position + velocity
2. Ball Tracking

Ball tracking is much harder than player tracking — the ball is small, fast, and often occluded. A Kalman filter with a ballistic motion model (including gravity for aerial balls) can maintain estimates during occlusions and predict where the ball will land.

State: [x, y, z, vₓ, vᵧ, vᵤ] | Model: Ballistic (gravity + drag) | Challenge: State changes at kicks/bounces → high Q or reset
3. Multi-Object Tracking (MOT)

In computer vision pipelines (like the video analysis pipeline), Kalman filters are the backbone of tracking-by-detection. Each detected bounding box is a measurement. The Kalman filter predicts where each player should be in the next frame, then Hungarian assignment matches predictions to detections.

Used in: SORT, DeepSORT, ByteTrack | State: [x, y, w, h, vₓ, vᵧ] in image space | Key benefit: Handles missed detections
4. Sensor Fusion

Some systems combine multiple sensors: GPS + accelerometer, or optical tracking + LPS (Local Positioning System). The Kalman filter naturally fuses these by giving each sensor its own observation matrix H and noise R. More sensors = lower uncertainty.

Example: GPS at 10 Hz (R large) + accelerometer at 100 Hz (R small for acceleration, no position) → fused estimate better than either
5. Speed & Acceleration Estimation

Rather than computing speed as Δposition/Δtime (which amplifies noise), the Kalman filter state includes velocity directly. You can extend it to include acceleration too: state = [x, y, vₓ, vᵧ, aₓ, aᵧ]. This gives reliable speed, acceleration, and even jerk estimates for physical load monitoring.

Why it matters: Naive Δv/Δt at 25 Hz gives ±2 m/s² noise. Kalman-filtered acceleration has ±0.3 m/s² noise. Critical for sprint detection and injury risk.
6. Data Quality & Anomaly Detection

The innovation ỹₖ = zₖ − Hx̂ₖ⁻ tells you how surprising each measurement is. Large innovations (normalised by Sₖ) indicate sensor glitches, ID swaps, or genuine sudden events. You can flag these for manual review or automatic correction.

Gate test: If ỹₖᵀ Sₖ⁻¹ ỹₖ > χ²₀.₉₉, reject measurement as outlier. The filter coasts on prediction instead.
Kalman Filters vs. Neural Networks
Classical meets modern

If you've read the earlier articles in this series on RNNs and Transformers, you might notice something familiar. The Kalman filter's predict-update cycle is structurally very similar to an RNN — both maintain a hidden state that gets updated with each new observation.

Kalman Filter
  • • Explicit physics model (F, H)
  • • Optimal for linear + Gaussian
  • • Interpretable: you know what each state means
  • • No training data needed
  • • Runs in microseconds
  • • One filter per player (independent)
Neural Network (RNN/Transformer)
  • • Learned dynamics (no explicit physics)
  • • Handles nonlinear, non-Gaussian patterns
  • • Black box: hidden state is opaque
  • • Needs lots of training data
  • • Slower inference
  • • Can model inter-player interactions
They're Complementary, Not Competing

In practice, you use both. The Kalman filter is your preprocessing layer — clean raw sensor data into smooth trajectories. Then feed that clean data into your neural network (STGNN, foundation model, etc.) for high-level analysis. The Kalman filter handles the physics; the neural network handles the tactics.

Modern Hybrid: Learned Kalman Filters

Recent research replaces the hand-tuned Q and R matrices with neural networks that predict them. The Kalman structure (predict-update with gain) is kept for its optimality guarantees, but the noise parameters adapt to context. A player sprinting gets different Q than one standing still. This is an active research frontier.

Summary
What You Learned
  • ✓ Why raw tracking data needs filtering (sensor noise, gaps)
  • ✓ State space models: hidden state vs. observations
  • ✓ The predict-update cycle
  • ✓ All five Kalman filter equations
  • ✓ Kalman gain: automatic trust weighting
  • ✓ Tuning Q and R for football tracking
  • ✓ Extended Kalman Filter for nonlinear systems
  • ✓ Six football applications
Key Equations
x̂ₖ⁻ = F · x̂ₖ₋₁
Pₖ⁻ = F · Pₖ₋₁ · Fᵀ + Q
Kₖ = Pₖ⁻Hᵀ(HPₖ⁻Hᵀ + R)⁻¹
x̂ₖ = x̂ₖ⁻ + Kₖ(zₖ − Hx̂ₖ⁻)
Pₖ = (I − KₖH)Pₖ⁻
Key Takeaway

The Kalman filter is the unsung hero of football analytics. Every time you load "clean" tracking data from SkillCorner, Metrica, or any provider, a Kalman filter (or something very like it) has already smoothed the noise, filled the gaps, and estimated the velocities. Understanding it gives you intuition for why tracking data looks the way it does, when to trust it, and how to improve it when you're working with noisier sources.