How Is Data Collected in Football? Methods, Accuracy & Sources

Football data collection transforms millions of live match events into structured insights, but most clubs and analysts still misunderstand where their numbers actually come from.

By David Findlay, Founder of KiqIQ.

Quick Answer: Football data is collected through a combination of optical tracking systems installed in stadiums, GPS wearables on players, and trained human analysts who manually tag events during live matches, then validate them post-match for accuracy.

Definition: Football data collection is the systematic process of recording, tagging, and validating match events, player movements, and performance metrics using a combination of automated tracking technology and human analysis.

Key point: The accuracy of football statistics depends less on technology and more on the consistency of human interpretation applied to subjective events like key passes, big chances, and successful tackles.

How Live Match Data Is Captured

The Premier League and other elite competitions rely on a hybrid model combining automated systems and human oversight. Opta, the Premier League’s official data partner since 1997, deploys a three-person analysis team per match. Two analysts watch live video feeds and tag every touch, pass, and defensive action. The third analyst rewinds footage to validate entries in real time. Post-match, the entire dataset undergoes a second review to ensure accuracy.

This workflow captures objective events like goals, corners, and offsides with minimal error. Subjective metrics such as key passes or big chances require standardised definitions. Opta provides analysts with detailed criteria to maintain consistency across fixtures and seasons. Without these definitions, one analyst might classify a pass as key while another does not, undermining the dataset’s reliability.

Optical tracking systems such as TRACAB and Second Spectrum complement human tagging. Installed cameras track player and ball positions at up to 25 frames per second, generating positional data used for heatmaps, sprint distances, and average formations. These systems operate with an error margin below five per cent for objective metrics but still require human validation for context-dependent events.

football benchmark

The Role of GPS and Biometric Systems

Players in training and some competitive matches wear GPS vests that record distance covered, sprint speeds, acceleration, and deceleration. These wearables capture biometric data such as heart rate and workload, feeding into performance management and injury prevention protocols.

GPS data is highly accurate for raw physical output but does not capture tactical nuance. A midfielder covering eight kilometres in a match tells you nothing about positioning discipline, passing angles, or defensive pressure applied. Clubs pair GPS metrics with optical tracking and event data to build a complete performance profile.

The friction here is integration. GPS providers, optical tracking vendors, and event data companies often use different data formats and timestamps. Clubs must invest in data engineering to align these streams into a single source of truth. Smaller clubs without dedicated data teams face significant capture cost and workflow friction when attempting to merge datasets.

Advanced Metrics and Machine Learning Models

Metrics like Expected Goals (xG) and Expected Assists (xA) are not directly observed. They are calculated using machine learning models trained on thousands of historical shots. Each shot is assigned a probability of becoming a goal based on factors including shot location, angle, assist type, defensive pressure, and game state.

Oracle, the Premier League’s advanced analytics partner, uses machine learning to generate live metrics such as win probability and momentum tracking. These models simulate match outcomes over 100,000 iterations per game state, updating probabilities in real time. The output is designed for broadcast audiences but also informs coaching staff during matches.

StatsBomb and Wyscout offer similar advanced metrics, selling datasets to clubs and analysts. StatsBomb’s models account for defensive pressure, body position, and whether a shot followed a dribble or set piece. Wyscout integrates video with tagged events, enabling scouts to filter players by specific actions and watch corresponding footage.

emerges when clubs try to apply these models without understanding their assumptions. An xG model trained on top-five European leagues may not generalise to lower divisions where shot selection and defensive organisation differ. Analysts must validate model outputs against observed outcomes before embedding them in decision workflows.

Where Football Data Comes From by Provider

Opta focuses on event data and supplies the Premier League’s official statistics. Their dataset includes over 200 event types per match, from passes and tackles to recoveries and aerial duels. Opta’s data feeds the Premier League website, broadcast graphics, and third-party platforms like FBref.

Opta Power Ranking

StatsBomb collects event data with additional context layers, such as freeze frames showing player positions at the moment of a shot. This enables more granular xG models and defensive pressure metrics. Newcastle United, Wolverhampton Wanderers, and Everton have partnered with StatsBomb for recruitment and performance analysis.

StatsBomb IQ

Wyscout combines event data with video, offering a searchable database of tagged actions linked to match footage. Arsenal and Manchester United have used Wyscout for scouting, filtering players by specific attributes such as progressive passes or defensive duels won in the final third.

Wyscout - SofaScore Alternative

Second Spectrum and TRACAB provide optical tracking, capturing positional data used for tactical analysis. Their systems generate heatmaps, passing networks, and average formations. Clubs use this data to evaluate pressing triggers, defensive shape, and transition speed.

Each provider uses different tagging protocols and definitions. A successful dribble in one dataset may be classified differently in another. Analysts must account for these inconsistencies when comparing metrics across providers or competitions.

How Accurate Are Football Statistics

Accuracy varies by metric type. Objective events like goals, shots on target, and corners have near-perfect accuracy because they are verifiable and rule-based. Subjective events like key passes, big chances, and successful tackles depend on human interpretation and are vulnerable to inconsistency.

Optical tracking systems achieve positional accuracy within five centimetres under optimal conditions but can misidentify players during crowded sequences. GPS data is highly accurate for distance and speed but does not capture ball interaction or tactical intent.

Advanced metrics introduce a different accuracy concern. An xG model may assign a 0.3 probability to a shot, but that does not mean the shot had a 30 per cent chance of scoring in that specific instance. It means that historically, similar shots resulted in goals 30 per cent of the time. The model is probabilistic, not deterministic.

Context loss is a persistent issue. Possession percentage does not distinguish between controlled build-up and aimless circulation. A team defending a lead may intentionally cede possession, making raw possession stats misleading without game state context.

How Clubs Use Collected Data

Liverpool’s data department, established after Fenway Sports Group’s 2010 takeover, integrates recruitment, performance analysis, and opposition scouting. Analysts filter transfer targets by specific metrics, then validate findings with video and live scouting. Performance analysts prepare pre-match reports highlighting opposition weaknesses and tactical tendencies.

Manchester City employs a similar structure, using data to inform tactical adjustments and player development. Analysts track individual player metrics over time, identifying performance trends and injury risk factors.

Bolton Wanderers under Sam Allardyce pioneered data use in the Premier League during the early 2000s. Allardyce created a war room where coaches analysed data on large screens, identifying high-value goal-scoring opportunities and optimising set-piece routines. With a smaller budget than competitors, Bolton used data to maximise efficiency.

Smaller clubs face higher capture costs. Subscriptions to StatsBomb or Wyscout can exceed £50,000 annually. Clubs without dedicated data engineers struggle to integrate multiple data streams, limiting their ability to generate actionable insights.

Free and Paid Data Sources

The Premier League website offers basic stats including goals, assists, shots, and pass completion. The site allows comparisons between two players and provides head-to-head team stats. Data coverage extends back to the 1992/93 season, though advanced metrics are only available from 2006/07 onwards.

FBref, launched in 2018, provides free access to detailed event data sourced from StatsBomb. The platform includes xG, xA, progressive passes, and defensive actions. Data is presented in sortable spreadsheets, enabling quick comparisons across players and teams. FBref is widely used by analysts, journalists, and fantasy football players.

What is FBref? Logo here.

StatsBomb and Wyscout require subscriptions but offer deeper datasets and advanced metrics unavailable elsewhere. Both platforms integrate video with tagged events, allowing users to filter actions and watch corresponding footage. Professional analysts and clubs rely on these platforms for recruitment and performance analysis.

RSSF and Wikipedia provide historical match results and competition records. These sources are useful for context but lack the granularity required for performance analysis.

The Evolution of Data Collection Since 1992

The Premier League’s inaugural 1992/93 season recorded only basic match statistics: goals, assists, and disciplinary cards. Detailed performance analysis was not feasible with such limited data.

Opta’s partnership with the Premier League began in 1997/98, introducing event-level tracking. Early datasets remained simple, but from 2006/07 onwards, Opta expanded coverage to include key passes, tackle success, and shooting accuracy. This shift enabled clubs to begin using data for tactical and recruitment decisions.

Oracle’s partnership, launched in 2021/22, introduced live advanced metrics for broadcast audiences. Models like momentum tracking and win probability are calculated in real time, enhancing viewer engagement and providing coaches with in-match insights.

Clubs initially resisted data adoption. Bolton Wanderers under Sam Allardyce were early adopters, but widespread investment did not occur until Fenway Sports Group’s 2010 takeover of Liverpool. Today, every Premier League club employs analysts, data engineers, and performance scientists.

Common Pitfalls in Data Collection and Interpretation

Inconsistent definitions across providers create comparison challenges. A successful dribble in one dataset may require beating a defender, while another counts any forward movement with the ball. Analysts must verify definitions before drawing conclusions.

Context loss undermines raw stats. High possession does not guarantee dominance if the ball is circulated without penetration. Low tackle counts may reflect effective positioning rather than poor defending. Analysts must pair event data with video and tactical context.

Overfitting to advanced metrics leads to poor decisions. A player with high xG may underperform if finishing ability is weak. Models provide probabilities, not certainties. Clubs must validate model outputs with qualitative assessment.

Data latency affects real-time decision-making. Optical tracking systems experience network delays, causing event timestamps to misalign. Post-match validation corrects errors, but live decisions may rely on incomplete data.

Frequently Asked Questions

Who collects data during Premier League matches?

Opta deploys a three-person team per match: two analysts tag events in real time while a third validates entries by rewinding footage. Post-match, the dataset undergoes a second review to ensure accuracy before publication. The difference usually comes down to context, scope, and how the term is applied in practice.

How accurate is Expected Goals (xG) in football?

xG models assign probabilities based on historical shot data, not certainties. A shot with 0.3 xG means similar shots resulted in goals 30 per cent of the time historically. Accuracy depends on model training data and whether it reflects the competition and era being analysed.

Where can I access free football data?

FBref offers free access to detailed event data, including xG, xA, and defensive metrics, sourced from StatsBomb. The Premier League website provides official stats but with less granularity. Both platforms cover matches back to the 1992/93 season. The difference usually comes down to context, scope, and how the term is applied in practice.

Do football clubs use GPS data during matches?

GPS wearables are permitted in training and some competitions but banned in the Premier League during matches. Clubs rely on optical tracking systems to capture positional and movement data during competitive fixtures instead. The difference usually comes down to context, scope, and how the term is applied in practice.

What is the difference between Opta and StatsBomb data?

Opta focuses on event data with over 200 event types per match and supplies the Premier League’s official statistics. StatsBomb adds context layers like freeze frames and defensive pressure, enabling more granular advanced metrics. Both require subscriptions for full access, though FBref republishes some StatsBomb data for free.

Sources