In the previous articles, we learned about CNNs for spatial patterns in grids and RNNs for temporal patterns in sequences. But what about data that doesn't fit neatly into a grid or a sequence?
Consider a football match with 22 players on the pitch. Each player interacts with multiple others — passing, pressing, marking. These relationships form a network, not a grid (like an image) or a chain (like a sentence). CNNs assume regular spatial structure. RNNs assume a single sequence. Neither captures the arbitrary connectivity between entities in a network.
Many real-world systems are naturally described as graphs — collections of entities (nodes) connected by relationships (edges). Social networks, molecules, road systems, and yes, football teams. To learn from this data, we need architectures that respect the graph structure.
Examples of Graph-Structured Data
- • Social networks (users + friendships)
- • Molecules (atoms + bonds)
- • Citation networks (papers + references)
- • Road networks (intersections + roads)
- • Knowledge graphs (entities + relations)
- • Players on pitch (players + interactions)
- • Passing networks (players + passes)
- • Team formations (positions + connections)
- • Pressing structures (defenders + coverage)
- • Tactical relationships (roles + dependencies)
Graph Neural Networks solve this by operating directly on graph-structured data. They learn to combine information from connected nodes, allowing each entity to understand its context within the network.
Before diving into GNNs, let's make sure we understand what a graph is. A graph is simply a way to represent entities and their relationships.
Formal Definition
Additional Graph Components
Types of Graphs
Edges have no direction — if A connects to B, then B connects to A. Example: Players within 10 meters of each other.
Edges have direction — A→B doesn't mean B→A. Example: Passing network (passes go one direction).
Edges have weights representing strength/importance. Example: Number of passes between players.
Structure changes over time. Example: Player positions and interactions evolve during a match.
In football analytics, we typically construct graphs where nodes = players and edges = interactions. Edges can be defined by: (1) spatial proximity (within X meters), (2) passing relationships, (3) marking assignments, or (4) fully connected within teams. Node features include position (x, y), velocity, acceleration, team, and role.
The fundamental operation in GNNs is message passing (also called neighborhood aggregation). The idea is beautifully simple: each node updates its representation by gathering and combining information from its neighbors.
In a football context: a central midfielder's representation should incorporate information about nearby teammates (passing options), nearby opponents (pressing threats), and the ball position. After message passing, the CM "knows" about its local context — who's around, where the space is, what options exist.
The Three Steps of Message Passing
Each neighbor creates a "message" — a transformed version of its features to send to the target node.
Combine messages from all neighbors into a single aggregated message. Must be permutation invariant (order doesn't matter).
Combine the node's own features with the aggregated neighbor message to produce an updated representation.
The General Message Passing Formula
The aggregation function must give the same result regardless of the order we process neighbors. If a CM has neighbors [CB, RB, LW, RW], aggregating in order [CB, RB, LW, RW] must equal [LW, CB, RW, RB]. Sum, mean, and max all have this property. This is crucial because graphs have no natural ordering of nodes!
After one message passing layer, each node knows about its immediate neighbors (1-hop). But what about nodes further away? By stacking multiple layers, information propagates further through the graph.
Each node aggregates info from 1-hop neighbors. The CM knows about adjacent players.
Each node now has info about 2-hop neighbors. The CM knows about neighbors' neighbors.
Each node has aggregated info from all nodes within L hops. The CM understands the broader tactical picture.
Don't stack too many layers! With many layers, all nodes end up with similar representations because they've all aggregated information from the entire graph. This is called over-smoothing. For most tasks, 2-4 layers is optimal. In football graphs with ~22 nodes, even 2-3 layers often covers the full graph.
Think of message passing like players communicating during a match. In one "round" (layer), each player talks to their immediate neighbors. After two rounds, information about the whole defensive line has reached the striker via the midfield. After three rounds, everyone has a sense of the full team shape. More rounds just add noise.
The choice of aggregation function significantly impacts what the GNN can learn. Different functions have different strengths and weaknesses.
Adds up all neighbor features. Preserves information about how many neighbors have certain features.
Averages neighbor features. Normalizes by node degree, making it robust to varying connectivity.
Takes maximum value per feature. Captures the most extreme/important neighbor.
Learns importance weights for each neighbor. Most expressive but most expensive.
SUM when neighbor count matters (e.g., "surrounded by opponents"). MEAN for general-purpose, stable training. MAX when you care about extremes. ATTENTION when different neighbors have different importance and you have enough data to learn weights.
Different GNN architectures implement message passing with different design choices. Here are the most influential ones:
The foundational modern GNN. Uses normalized mean aggregation with self-loops.
Samples a fixed number of neighbors instead of using all. Enables mini-batch training on huge graphs.
Learns attention weights for each edge, allowing the model to focus on important neighbors.
Provably as powerful as the WL graph isomorphism test. Uses sum aggregation with MLP update.
The Graph Convolutional Network (GCN) is the most widely used GNN architecture. Let's break down exactly how it works with a step-by-step example.
The GCN Layer Equation
Symbol Definitions
Worked Example: GCN on a Triangle
[0, 1, 1] (CB)
[1, 0, 1] (CM)
[1, 1, 0] (ST)
D̃ = diag([3, 3, 3])
(Each neighbor contributes equally, weighted by 1/√(d_i × d_j) = 1/3)
h_2^agg = (1/3)×[20,30] + (1/3)×[50,50] + (1/3)×[70,50]
h_2^agg = [6.7+16.7+23.3, 10+16.7+16.7] = [46.7, 43.3]
(W transforms to new feature dimension, ReLU adds non-linearity)
Sometimes we want to predict properties of individual nodes (e.g., "will this player receive a pass?"). Other times, we want to predict properties of the entire graph (e.g., "will this attacking situation result in a goal?"). For graph-level predictions, we need a readout (or pooling) function.
Types of Graph Tasks
Predict for each node. Output: N predictions.
Predict for each edge. Output: E predictions.
Predict for entire graph. Output: 1 prediction.
Readout Functions
Simple aggregation of all node features. Works well for many tasks.
Progressively coarsen the graph (cluster nodes, pool, repeat). Captures multi-scale structure.
Learn which nodes are most important for the graph-level prediction.
Uses an LSTM to iteratively attend to nodes, building a summary. More expressive than simple aggregation.
To predict xG for a shot situation: (1) Build graph with shooter + nearby players as nodes, (2) Run GNN layers to let each player "understand" their context, (3) Apply readout (e.g., attention focusing on shooter and goalkeeper), (4) Feed graph representation to MLP → xG probability.
GNNs are trained using the same principles as other neural networks: forward pass, compute loss, backpropagate, update weights. The main difference is that the forward pass involves the graph structure.
Common Loss Functions
Cross-entropy loss on labeled nodes
MSE on continuous targets
Binary cross-entropy on edge existence
Cross-entropy on graph-level labels
Practical Training Tips
Add residual connections (h^(l+1) = h^(l+1) + h^(l)) to help gradient flow and reduce over-smoothing.
Apply LayerNorm after each GNN layer to stabilize training.
Randomly drop edges during training (DropEdge) to prevent overfitting to graph structure.
For small graphs (like 22 players), 2-3 layers typically covers the full graph. More layers can hurt.
GNNs are particularly well-suited for football analytics because football is fundamentally about relationships between players. Here are the main applications:
Predict where each player will move in the next 1-5 seconds, considering team context and opponent positions.
Predict the probability of a pass succeeding between any two players, given the current game state.
Predict the value of the current game state — how likely is the attacking team to score vs. lose possession?
Classify the current tactical formation (4-3-3, 4-4-2, 3-5-2, etc.) from player positions.
Identify gaps in defensive structure by analyzing the graph of defenders and their coverage zones.
Learn a vector representation of team playing style from their passing network structure.
Football involves permutation invariant reasoning — whether we label the CM as player 4 or player 8 shouldn't change predictions. GNNs naturally handle this. They also capture relational reasoning — the striker's value depends on the quality of service from midfielders, the goalkeeper's position, the defensive line height, etc. No other architecture captures these dependencies as naturally.
Each architecture is designed for different data structures. Here's how they compare:
| Aspect | CNN | RNN | GNN |
|---|---|---|---|
| Data structure | Regular grids (images) | Sequences (time series) | Arbitrary graphs |
| Connectivity | Fixed (neighbors in grid) | Fixed (prev → next) | Flexible (any connections) |
| Key operation | Convolution (sliding kernel) | Recurrence (hidden state) | Message passing |
| Football use | Pitch heatmaps, video frames | Event sequences, trajectories | Player interactions |
| Permutation | Not invariant | Not invariant | Invariant ✓ |
In practice, we often combine these! For spatiotemporal football data: use GNNs for spatial relationships between players at each timestep, and RNNs/Transformers for temporal relationships across timesteps. This combination is called a Spatiotemporal GNN — which is exactly what we'll cover in the next article!
- ✓ Graphs: nodes, edges, adjacency matrices
- ✓ Why CNNs/RNNs can't handle graph data
- ✓ Message passing: aggregate neighbor info
- ✓ Aggregation functions: sum, mean, max, attention
- ✓ GNN architectures: GCN, GraphSAGE, GAT, GIN
- ✓ Stacking layers expands receptive field
- ✓ Readout for graph-level predictions
- ✓ Football applications: trajectories, passes, xG
- 5. Spatiotemporal GNNs for Football
- Combining GNNs with RNNs/Transformers for dynamic graph data
- Advanced football analytics use cases
Excited about Graph Neural Networks? 👍 Dive into the code! Check out our GNN GitHub repository for implementations, tutorials, and resources to kickstart your hands-on journey.
View GNN Repository on GitHub