UNIVERSIDAD DE COSTA RICA

SISTEMA DE ESTUDIOS DE POSGRADO

MULTIPLE MULTIPARTITE GRAPHS OCCLUSION SOLVER FOR SOCCER

PLAYERS’ TRACKING

Tesis sometida a la consideración de la Comisión del Programa de Estudios

de Posgrado en Ingeniería Eléctrica para optar al grado y título de Maestría

Académica en Ingeniería Eléctrica

LENNON NÚÑEZ MEOÑO

Ciudad Universitaria Rodrigo Facio, Costa Rica

2024


Dedication

To my parents and family, for always supporting me into achieving every goal.

To my managers, coworkers, and friends that helped me survive this adventure.

To God, for granting me challenges for growth.

Thanks

ii


Acknowledgement

This research was supported by CeNAT - CONARE Fellowship Program, in collaboration with CNCA

(National Advanced Computing collaboratory)

This research was partially supported by a machine allocation on Kabré supercomputer at the Costa

Rica National High Technology Center

iii


This thesis is approved by the Electrical Engineering Postgraduate Studies Committee of the
University of Costa Rica, as a partial requirement to opt for the Master’s degree in Electrical

Engineering.
Esta tesis fue aceptada por la Comisión del Programa de Estudios de Posgrado en Ingeniería

Eléctrica de la Universidad de Costa Rica, como requisito parcial para optar al grado y título de
Maestría Académica en Ingeniería Eléctrica.

Dr. Jaime Cascante Vindas
Dean Representative

Sistema de Estudios de Posgrado

Dr. rer. nat. Francisco Siles Canales
Thesis Director

Ph.D. Esteban Meneses Rojas
Thesis Assessor

Ph.D. Marvin Coto Jiménez
Thesis Assessor

M. Sc. Marco Villalta Fallas
Director Representative

Programa de Posgrado en Ingeniería Eléctrica

Lennon Núñez Meoño
Candidate

iv


Contents

Cover Page i

Dedication ii

Acknowledgement iii

Approval Sheet iv

Table of Contents v

Abstract vii

Resumen viii

List of Tables ix

List of Figures x

Glossary xiii

Acronyms xv

Nomenclature xvii

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Document Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Professional Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.7 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Theoretical Background 17

v


2.1 Multiple Object Tracking - Soccer Players Tracking . . . . . . . . . . . . . . . . . . . . . 17

2.2 Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 The Graph Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Multiple Multipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5 Player Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 Multiple Object Tracking Metrics for Validation . . . . . . . . . . . . . . . . . . . . . . . 40

3 Methodology 46

4 Implementation 48

4.1 Multipartite Graph Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Duplicate solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3 MHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4 MMPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.5 Filtering spurious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.6 Handling Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Results and Discussion 68

5.1 Test Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Results Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Conclusion and Recommendations 90

Bibliography 92

Appendix A ACE tracking platform configuration file 104

Appendix B Annotation Campaign Notifications Google App Scripts Code 108

Appendix C Detecting Occlusion 112

vi


Abstract

Multiple Multipartite Graphs tracking, a derived approach by mixing Multiple Hypothesis Tracking with

Multipartite Graphs, was integrated into ACE soccer player tracking system to address occlusion events.

Additionally, a spurious filtering based on hue histograms and predominant color swatches evaluation was

integrated to provide the tracking algorithms with a data set clean of spurious objects, which considerably

and negatively affect the tracker performance. Combining this spurious filter with other methods to handle

occlusion events, as occlusion alarm and merge blob splitting, it was possible to achieve a reduction of 88% in

false positives and an increase of 25% in MOTA in the test data set. When evaluating the tracker in reduced

sets to study occlusion events more closely, a MOTA of 76% was achieved, representing an improvement of

50% when compared against the original tracker with a MOTA of 50.55% under the same data set.

vii


Resumen

Rastreo por Múltiples Grafos Multipartitos, derivado de mezclar rastreo por múltiples hipótesis con grafos

multipartitos, fue integrado en la plataforma de rastreo de jugadores de fútbol ACE para enfretar eventos de

oclusión. Adicionalmente, un filtro de espurias basado en histogramas del tono y la evaluación de colores

predominantes en el muestreo de color fue integrado para proveer a los algoritmos de rastreo de conjuntos de

datos libres de espurias, las que considerablemente afectan de manera negativa al rendimiento del rastreador.

Al combinar este filtro de espurias con otros métodos para manejar eventos de oclusión, como lo son la alarma

de oclusión y el separador de blobs fusionados, fue posible obtener una reducción del 88 % en los falsos positivos

y un incremento del 25 % en MOTA. Al evaluar el rastreador en un conjunto de datos reducido para estudiar

más de cerca los eventos de oclusión, se alcanzó un MOTA de 76 %, representando una mejora del 50 % al

comparar con el MOTA de 50.55 % obtenido por el rastreador original para el mismo conjunto de datos.

viii


List of Tables

4.1 Comparison of Hypothesis-Oriented MHT and Track-Oriented MHT . . . . . . . . . . . . . 53

5.1 TARÁ System Specification [92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2 Tracking Metrics Summary - Individual Enabled Features . . . . . . . . . . . . . . . . . . . 80

5.3 Tracking Metrics Improvement - Individual Enabled Features . . . . . . . . . . . . . . . . . 81

5.4 Tracking Metrics Summary for Multiple Hypothesis Tracking (MHT) Variations . . . . . . 81

5.5 Tracking Metrics Summary for Multiple Multipartite Graphs (MMPG) Variations . . . . . 82

5.6 Metrics for various feature sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.7 Metrics for Original and mht_hue_dp_rm_ms features . . . . . . . . . . . . . . . . . . . . 84

5.8 Performance Improvement Metrics for mht_hue_dp_rm_ms feature . . . . . . . . . . . . . 84

5.9 MOTA evaluation with false positives error weight equals 0 . . . . . . . . . . . . . . . . . . 85

5.10 MOTA for the MHT with Hue Filter, Duplicate Solver and Merge Splitter features with

false positives error weight equals 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.11 Metrics of MMPG tracking of occlusion event O12
951,979 . . . . . . . . . . . . . . . . . . . . . 89

ix


List of Figures

1.1 Football Soccer (Association Football) field dimensions . . . . . . . . . . . . . . . . . . . . . 2

1.2 TRACAB Soccer Players Tracking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Match Analysis system flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 ACE computational platform architecture layers . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Segmentation issues on players detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Examples of scenarios perceived as occlusion events. . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Sequential graph where missing identities suggest an occlusion event [26] . . . . . . . . . . . 22

2.5 Graph examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Complete graph example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.7 Bipartite Graph Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.8 Bipartite Graph examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.9 Multipartite graph hypothesis example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.10 Multipartite graph processing with a window size 3 of a scene comprised of 6 frames . . . . 30

2.11 Multipartite graph processing with a window size 6 of a scene comprised of 6 frames . . . . 31

2.12 Hypothesis tree representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.13 Multiple multipartite graphs depiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.14 Comparing players from different teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.15 Player model and similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.16 Problem with occlusion and euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.17 Tracking object’s area overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1 Flow diagram of ACE multipartite graph tracker . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Examples of errors in ACE tracker identity assignment . . . . . . . . . . . . . . . . . . . . . 51

4.3 Examples of duplicate identities error in ACE tracker fixed by duplicate solver . . . . . . . 52

4.4 MHT Track Oriented with levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.5 TrackHypothesis UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6 Updated Player Blob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.7 Spurious blobs vs Player blobs hue channel histogram filtering . . . . . . . . . . . . . . . . 59

x


4.8 Predominant color swatches for spurious filtering . . . . . . . . . . . . . . . . . . . . . . . . 60

4.9 Example of effectiveness of the hue filter in removing spurious blobs . . . . . . . . . . . . 61

4.10 ACE Platform detecting corner as player . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.11 Spurious blobs vs Player blobs hue channel histogram filtering . . . . . . . . . . . . . . . . 62

4.12 Ghost Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.13 Ghost Blob wit drifting next position prediction . . . . . . . . . . . . . . . . . . . . . . . . 64

4.14 Splitting merged blobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1 Kabre supercomputer overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2 Camera Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3 ISSIA database issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4 Input vide example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.5 PRIS Annotation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.6 PRIS Annotation Tool - Annotation pannel . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.7 Annotation Campaign enrollment automation flow . . . . . . . . . . . . . . . . . . . . . . . 76

5.8 Annotation Campaign form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.9 Annotation Campaign e-mail confirmation message . . . . . . . . . . . . . . . . . . . . . . . 77

5.10 Examples of occlusion events and players’ position manual annotations generated with the

PRIS Annotation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.11 ACE Platform identity retention after an occlusion event . . . . . . . . . . . . . . . . . . . 86

5.12 ACE Platform original vs updated in multiple occlusion events . . . . . . . . . . . . . . . . 86

5.13 Occlusion event altered by segmentation error . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.14 Snippet of complex occlusion event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

C.1 Occlusion clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

C.2 Graphical description of modified Stretch Index SI(R) . . . . . . . . . . . . . . . . . . . . . 116

C.3 Soccer player match scenes classification by player’s agglomeration . . . . . . . . . . . . . . 117

C.4 Players Dispersion Level of a 40 s soccer match clip . . . . . . . . . . . . . . . . . . . . . . 118

C.5 Estimated bivariate probabilistic function of soccer players spatial distribution for frame

k = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

C.6 Estimated bivariate probabilistic function of soccer players spatial distribution for frame

k = 600 and PDL < 400, classified as Bad-Behavedscene . . . . . . . . . . . . . . . . . . . . 121

xi


C.7 Estimated bivariate probabilistic function of soccer players spatial distribution for frame

k = 1100 and PDL > 500, classified as Well-Behavedscene . . . . . . . . . . . . . . . . . . . 122

xii


Glossary

Bad-Behaved Scene Frame from a soccer match video recording in which player blobs are agglom-

erated in such a way it becomes .. xi, 113, 114, 117, 120, 121

Dynamic Occlusion Event Occlusion event in which the occlusion’s Agglomeration Size is variable

and/or the event is comprised of smaller intermittent occlusion events, e.g. new players inter-

acting with previously occluded ones at any point of the event, players constantly entering and

leaving the occlusion event area, two players occluding intermittently, etc. . 23, 69

False-new-player Case in which a player’s identity is lost during a certain amount of frames, after

which the same player is identified as a new one, therefore being assigned a new player identity

label. E.g. have players identified as p1 and p2 during frames 40 to 45, during frames 46 to 50 p2

is not detected, but it is detected again as a player during frames 51 to 60. Due to being missing

in frames 46 to 50, it is detected as a new player entering the scene, therefore, it is assigned label

p3 instead of p2, with no connection in tracking history of labels p2 and p3.. 22

Identity Swap Case in which at least two players have their identity labels swapped. Usually occurs

with players of the same team in which visual features and proximity lead the tracking algorithm

to detect one as the other.. 7–9, 11, 12, 20, 22, 24

Medium-Well-Behaved Scene Frame from a soccer match video in which player blobs are some-

what spaced, tracking algorithm may or may not solve occlusion events.. 113, 117

Multiple Object Tracking Accuracy A performance metric used in computer vision and tracking

systems to evaluate how accurately and consistently objects are tracked over time. It measures

the overall quality of a tracking system by considering factors such as missed objects, false

positives, and identity switches in the tracking process.. 11, 69

Multiple Object Tracking Precision A performance metric used to evaluate the precision of a

multiple object tracking system. It measures the accuracy of the predicted object locations by

calculating the average distance between the predicted and ground truth object positions.. 84

xiii


Occlusion Visual effect in which, due to the projection of a 3D scene into a 2D plane, the object

of interest is visually blocked by another object from the same scene. Angle perspective of the

camera and objects proximity affect occlusion perception. The occlusion is identified by its length

(video frames) and event complexity.. 6–13, 15–17, 19–24, 112, 113

Occlusion Agglomeration Size Number of players involved in an occlusion event.. 23

Occlusion Dynamics Index Value that represents the complexity of the occlusion event’s dynamics.

It is calculated as the sum of the weights of different occlusion dynamics: intermittency, team

interaction.. 23

Occlusion Event Complexity Score that depicts how complex an occlusion event is in terms of the

involved objects’ interaction and the occlusion area objects density. The event can be categorized

by its Dynamics, as Steady Occlusion Event or Dynamic Occlusion Event, and by its Agglom-

eration Size, represented by the number of players involved. It’s the product of the Occlusion

Dynamics Index and its Agglomeration Size.. 23, 113

Occlusion Event Region Region in the image plane where the occlusion event occurs. Its location

changes accordingly to the players location, and its size may vary if it is a Dynamic Occlusion

Event.. 23

Steady Occlusion Event Occlusion event in which the occlusion’s Agglomeration Size is constant,

i.e. the same occluded players are present from start to end of the event, and there is no

intermittency of players. . 23, 69

Well-Behaved Scene Frame from a soccer match video in which player blobs are spaced and may

be present only simple occlusion events that the tracking algorithm can solve without needing

an occlusion solver.. xii, 113, 117, 120, 122

xiv


Acronyms

AI Artificial Intelligence. 14

BPG Bipartite Graph. 10

CNCA National Advanced Computing Collaboratory (acronym in Spanish). 70

DOE Dynamic Occlusion Event. 23, 69, Glossary: Dynamic Occlusion Event

HOMHT Hypothesis Oriented Multiple Hypothesis Tracking. 54

IoU Intersection over Union. 40

KDE Kernel Density Estimation. 70, 122–124

MHT Multiple Hypothesis Tracking. ix, 6, 11–14, 31, 32, 34, 47, 48, 52, 54, 57, 62, 68, 72, 81, 84,

85, 88, 90, 112

MMPG Multiple Multipartite Graphs. ix, 6, 13, 14, 17, 32, 48, 55, 57, 62, 72, 80–82, 84, 86, 88–90,

124

MOT Multiple Object Tracking. 6, 8, 10, 17, 20, 62

MOTA Multiple Object Tracking Accuracy. 11, 69, 84–88, Glossary: Multiple Object Tracking

Accuracy

MOTP Multiple Object Tracking Precision. 84, Glossary: Multiple Object Tracking Precision

MPG Multipartite Graph. 6, 10, 13, 14, 27–32, 34, 49–51, 54–57, 65, 90, 122, 123

OAS Occlusion Agglomeration Size. 23, 24, Glossary: Occlusion Agglomeration Size

ODI Occlusion Dynamics Index. 23, 24, Glossary: Occlusion Dynamics Index

ODI-I Occlusion Dynamics Index Intermittency weight. 23, 24

xv


ODI-T Occlusion Dynamics Index Team weight. 24

OEC Occlusion Event Complexity. 23, 24, 113, Glossary: Occlusion Event Complexity

OEC Occlusion Event Region. 23, Glossary: Occlusion Event Region

OECS Occlusion Event Complexity Score. 23, 24, 69, 85

PDL Players Dispersion Level. 116, 117, 120, 123, 124

PF Particle Filter. 8, 9, 11, 12, 22, 23, 27, 38

PRIS-Lab Pattern Recognition and Intelligent Systems Laboratory. 18, 69, 70, 72, 74, 78

ROI Region of Interest. 12, 22

SOE Steady Occlusion Event. 23, 69, Glossary: Steady Occlusion Event

TOMHT Track Oriented Multiple Hypothesis Tracking. 54–56, 124

WSL Windows Subsystem for Linux. 48, 70

xvi


Nomenclature

AgIx Agglomeration Index. It has a value of 1 if there are only two players present

in the occlusion event, and the number of players involved in any other case.

CS Complexity Score, calculated as CS = DyIx ∗AgIx

DyIx Dynamics Index. The sum of Intermittency and team interaction weights.

DyIx = ODIteam +ODIinttermitency

Gi Single graph, with position i inside a multipartite graph.

k Frame number.

kf Final frame.

ki Initial frame.

mpgki,kf
Multipartite graph starting on frame ki and ending on frame kf .

N Number of nodes.

n Node of a graph.

O Occlusion event.

o Occlusion event snippet.

Ocomplexity
ki,kf

Occlusion event from frame ki to kf , and complexity score complexity.

ok,objects Occlusion event snippet in the frame k, involving objects number of individ-

uals.

w Window of frames used to perform object tracking in sections of the video.

wK Window of size k.

wki,kf
Window composed of the frames from ki to kf .

Examples:

xvii


• Let mpg300,800 a multipartite graph of size 501 graphs, result of the tracking process from frames

ki = 300 to kf = 800 in the video window w300,800.

• The graph G4, of the multipartite graph mpg300,800, has N = 22 nodes, while the graph G5 has

N = 18 nodes, an error in the tracking is possible as less objects are tracked in the subsequent

graph.

xviii


1

Chapter 1

Introduction

This chapter covers the project’s background and motivation, problem description and proposal of the

solution to the problem stated.

1.1 Background and Motivation

Object tracking is used in different areas where it is important to know the motion of the objects

of interest; some examples are submarine tracking, where a submarine needs to recognize projectiles

and other ships moving alongside using passive sonars, aircraft surveillance with a similar scenario

but using passive radars or an electronic warfare device to recognize the moving surroundings [77], in

analysis of sports data [92], and, more recently, in medicine fields for cell tracking [54].

In terms of sport disciplines, the athletes usually use visual guides to improve and practice, be it

recordings of the sport in execution or other visual aids on details of the discipline. The traditional

method to analyze all the information available from those sources is to manually pick the scenes of

interest and then select key plays to replay and study, the coach then will design a strategy based

on the information extracted to improve the athlete’s performance. Object tracking is introduced as

a way to automatize data extraction for sports analysis, some disciplines require the tracking of an

athlete’s motion, in other cases like team sports, it is required to capture the motion of the many

players involved. Players tracking goal is to provide information of the position of the player at any

time during the whole match, by analyzing this simple piece of data it is possible to tell different

statistics of its performance: total time played, distance traveled, area of the field covered during

the match, speed, fatigue and placement of the player during different key plays [69]. Thus, players’

performance and tactical evaluation can be assessed based on information from object tracking, making

it a valuable tool for sports analysis.

One of the most popular sports in the world, with an estimate of more than 3.5 billion of fans,

is the Association Football, or simply Football or Soccer [97]. It is a game in which two teams of 11


2

players, using any part of their bodies except their hands and arms, try to maneuver the ball into the

opposing team’s goal area, with a valid maneuvering field of about 90–120 m long and 45–90 m wide.

The team that scores more goals wins at the end of a match [72].

Figure 1.1: Football Soccer (Association Football) field. Source https://www.conceptdraw.com/

examples/football-pitch-dimensions-in-metres

In the last years, some companies have dabbled into soccer players tracking for sports data analysis,

the following are some examples of the most cited commercial applications in professional literature:

Kizanaro: It’s a Uruguayan company started in 2008 as part of an academic project, with

the purpose to provide services into soccer applied technology. It is used by the soccer teams of

Angola, Uruguay, Paraguay, Venezuela, Colombia and others. The company states that it allows a fast

visualization and analysis of the matches, obtaining information about the performance of the teams

and players into video sections. The software classifies and breaks down the main plays of the match

[41].

TRACAB: Optical Player Tracking solution from ChyronHego team, started in 2003 in Sweden.

It’s installed in about 300 soccer stadiums, being the official tracking technology used in leagues such

as the English Premier League, German Bundesliga and Spanish La Liga, having also been selected

for the major international UEFA and FIFA tournaments, as well as other sports than soccer. It uses

camera arrays to record the field and delivers 3D information of the matches. The object tracking is

supervised by a human operator (figure 1.2) [91].

https://www.conceptdraw.com/examples/football-pitch-dimensions-in-metres
https://www.conceptdraw.com/examples/football-pitch-dimensions-in-metres


3

Figure 1.2: TRACAB Soccer Players Tracking System: Camera array and super-
vised real time tracking. Source: https://chyronhego.com/products/sports-tracking/

tracab-optical-tracking/

ChyronHego also offers others solution to sports analysis as an annotation tool, named Paint,

used in sports news broadcasting to better visualize the different key plays. Nonetheless, the tool is

a graphical visualization aid for sports journalists rather than a tracking or ”ground truth” generator

framework [63].

STATS: Company with more than 35 years in the market working with sport fans and athletes.

The company offers different solutions into sports analysis and fan engagement, for data capture the

system SportVU is used, which offers performance statistics via extraction and processing of players

and ball coordinates during the match, using High Definition (HD) cameras and statistical software

and algorithms [89], [90].

Match Analysis: Started as a way to share the coach’s view to the team players, introduced

into the market in 2000 in USA with data collection centrals in USA and Mexico. It brings a set of

tools for video analysis and digital libraries to provide performance information and data storage, to

perform tactical analysis of soccer teams. Real time data is collected and then synced with the video

recordings by the analysts [51].

Since in 2015 the IFAB (International Football Association Board) allowed the use of techno-

wearables during some soccer matches [23] as well as the standardization of tracking systems by

FIFA [24], many other technologies are also taking part into soccer player tracking, including different

accessories into the players’ uniforms, one example is the use of GPS devices placed into the players’

shirts [34], [88]. However, optical players tracking has the advantage of not being intrusive on the

players and that it requires just the video cameras and not any other support accessory, it’s cheaper

and the methods can be exported to other areas on optical tracking. More recently even artificial

https://chyronhego.com/products/sports-tracking/tracab-optical-tracking/
https://chyronhego.com/products/sports-tracking/tracab-optical-tracking/


4

Figure 1.3: Match Analysis system flow. Source: http://matchanalysis.com/process.htm

intelligence systems are integrated into tracking players.

Due to the nature of private commercial software, it is not clear in general terms if the implemen-

tation is fully automated and which algorithms or methods are used to perform the object tracking

tasks.

The Laboratory of Pattern Recognition and Intelligent Systems (PRIS-Lab) has developed a tool

to perform Automated Soccer Players Tracking. This platform, named ACE, processes off-line video

recordings of soccer matches’ obtained from a single point of view, and returns the tracklets of the

players’ positions throughout the video. The current project development is on this platform, accessing

the source code and soccer matches ground truth data base. The main goal is to improve robustness

of the platform to occlusion events by implementing a Multiple Hypothesis method on it’s multiple

object tracking process.

1.2 Document Outline

The present work is organized as follows: first and current chapter is an introduction to object tracking

and automatic soccer player tracking to define the field of research and the problem to solve, it develops

the State of the Art i.e. the study of related professional literature to settle down the line of work,

and the scope and objectives of the project; chapter 2 contains the theoretical background on optical

object tracking and multiple hypothesis approaches; chapter 3 corresponds to the methodology, where

framework and time frame are described; chapter 4 covers the Multiple Hypothesis with Multipartite

Graphs implementation details for multiple object tracking as well as other developments into increas-

ing the tracker performance, chapter 5 presents and discusses the results, chapter ?? includes final

http://matchanalysis.com/process.htm


5

conclusions and recommendations. The final section contains the appendices with different supporting

documentation, as well as an additional brief research on estimating players’ distribution on the field

to dynamically enable/disable tracking algorithms in a multi-feature tracking platform.


6

1.3 Professional Literature

There exist different methods and algorithms that have been developed to solve the many problems

that arise when trying to track multiple objects, specially when these objects share so many features

in common. M. Villalta [92], based on the Object Tracking survey by A. Yilmaz et al. [99], proposes a

classification of these methods in three main classes according to the core functionality of the tracking

algorithms, suited as Deterministic Methods, Probabilistic Methods and Data Association Methods.

It is important to highlight that the order of appearance of these classes is a remark on the timeline

of evolution of the approach mindset, as prior methods were not able to completely solve the Multiple

Object Tracking (MOT) problem, it also depicts the focus transition when simpler issues were solved

and the more complex had to be taken into account, like occlusionevents. Furthermore, M. Manafifard

et al. [49] bring a detailed and extense survey on Player Tracking in soccer videos, reviewing about

163 different publications on the issue, studying cases like Multiple Hypothesis Tracking (MHT),

Particle Filters, Graph Representation and Occlusion Detection and Handling. From this study it

is important to highlight the following: detection (segmentation, background substraction) is critical

in the performance of the tracker, occlusionsare the biggest challenge in MOT, MHT has improved

different existent methods in occlusionsolving matters outperforming single hypothesis tracking, there

is no mention of Multipartite Graph (MPG) approach as presented by [92] nor a mixture of MHT and

MPG resulting in Multiple Multipartite Graphs (MMPG).

The following reviewed literature is used to illustrate the previous statements, the evolution of

different approaches and to sustain MHT for occlusionmanaging. In addition, a subsection has been

separated to discuss the works which implementations are focused on being robust to identity ambi-

guities and occlusionevents, appendix ?? depicts the distribution of reviewed methods in a table.

Deterministic Methods

These are methods that use visual characteristics to perform a deterministic trajectory tracking, i.e.

there is a model that describes its motion and/or its appearance. The basis of most of these methods

is to use brute force to find areas on an image that match a predefined and well known pattern. These

methods fail as the track is lost when objects interact with others alike (players of the same team) and

can’t solve occlusionby itself.

The main approach in this category is known as template matching, where a predefined template

is compared with different regions of the image, whenever it matches best, an object (player) would


7

be identified [10], [79]. The problem with template matching is that the algorithm is fully dependent

on its similarity with the objects of interest, if the template is not compatible with the players shape

or color in a determined frame, it will fail. This is common on soccer player tracking due to the many

visual changes a person has as it moves around the field turning around, crouching, falling and other

shape shifting events, enforcing the necessity of a big and robust templates dataset to consider the

many scenarios, also, the tracking result will have Identity Swap as the players of the same team have

similar chromatic distribution.

Lefèvre introduces the use of active contours into soccer player tracking with a snake-based method

[46]. The non-rigid nature of a soccer player is considered in this method, allowing a reconstruction

of the contours of the players whenever its necessary, adding flexibility in comparison with a fixed

template-matching. However, it would fail when occlusionhappens, as the contour may consider both

players (in a 2 objects occlusioncase) as a single player, due to one player blocking the other in a 2D

view.

To combat the occlusionproblem, some methods implement data fusion of information provided

from different sources to aid the players detection. A template is updated with newer information from

new observations over time, characteristics as size, texture and color are used to differentiate between

players, and states are stored so that correct assignment is done after the occlusionpasses, which doesn’t

fully solve occlusionassignment. This method increases the complexity of the tracking platform as it

incorporates an observation unit to update the templates, and also the template complexity by adding

new features to recognize (color, size, texture). As it still depends on a template, the right update of

the template and its initial state are crucial, which forces to manually declare the initial states of the

match to aid tracking [53], anyways, similar and occluded players remains a problem.

Probabilistic Methods

Most of these methods are based on Bayesian inference and estimations to perform object tracking.

They were introduced as a solution to optimal estimation of non-linear non-Gaussian state-space

models, situations where the typical linear model-based Kalman Filter was not enough [22]. The main

idea of these methods is to estimate the position of the object on a frame and update its likely position

with new measurements, the measurements are assigned certain probability (weights) based on previous

information until an optimal result is selected. It is important to highlight that these methods’ decisions

are based on probability, it chooses the most likely solution based on the cumulative probability of the


8

assignments along the different frames, therefore, a wrong assignment will spread over time.

One of the most popular probabilistic algorithms for object tracking is known as Particle Filter

(PF), introduced in 1993 as bootstrap filter, a numerical method for the solution of optimal sequential

state estimation problems in non-linear non-Gaussian scenarios [57], and with increasing applications

due to how the algorithm has evolved in hand with computational power. Due to its extended use in

the topic it has been considered important to dedicate this section to its implementation.

Czyz, Ristic & Macq implement a color-based particle filtering algorithm, it was first designed to

work with single object tracking, thus leading to wrap all players of the same team in a single global

state. As particles are attracted to areas of high density, error from a single player is dragged through

states update. The solution approached is to add a color histogram and to join detection and tracking

tasks to distinguish players of the same team [19]. The experiments show that PF is a powerful tool

on MOT, but the scenes analyzed did not consider occlusionevents, and the issue of particles moving

towards areas of high density persists, which leads to Identity Swap of players that are close together.

To solve the wrong assignment due to particle mixing, some authors have implemented single particle

filtering for each player.

Single particle filters per player increases the computational processing requirements as the al-

gorithm is applied for each player. Dearden, Demiris & Grau not only use Particle Filter for each

player but decide to spatially segment the most likely areas where the player may move, thus including

a spatial window and delimiting the possible solutions of the algorithm; PF is then applied on the

delimited area [20]. This approach diminishes the influence of one player’s error into the rest, however,

the occlusionproblem remains. During an occlusion, the particles used may mix and get messy, there-

fore, the reliability of the data obtained throughout the event decreases the longer the players remain

occluded. After the players separate from each other, using color-based recognition, the particles are

reassigned to the respective player by estimating the trajectory that each player had before the event.

Even though, the authors mention that particles mixing is still a possible outcome, so that all particles

are assigned to a single player (therefore, losing the other player) or leading to Identity Swap with

players of the same team.

Kristan et al. implement multiple single-trackers to confront the multi-target tracking problem,

including closed-world assumptions to aid in the visuals processing [43]. The robustness of the algo-

rithm is improved due to the constraints of a closed-world, as it was tested for indoor matches, however,

those assumptions cannot be considered when working with outdoor sports, as outdoor conditions are


9

not easy predictable (e.g. a cloud’s shadow may cover the field totally or partially for a short or long

period of time, it may rain or get foggy), also, the camera is statically placed above the court hanging

from the ceiling, position that cannot be achieved in an outdoors stadium.

A major problem when dealing with Particle Filters is the degradation of particle assignment,

specially when dealing with occlusion, as particles are wrongly assigned due to lose of information

in the involved frames, which leads to losing players’ identity after an occlusionevent (Identity Swap

and new player condition after a split), therefore, PF alone cannot solve the problem. Itoh et al.

[35] introduce a time-situation graph beforehand to an occlusionevent, the graph is used to solve the

players’ tracking lost after an occlusion, the quantity of players involved in the event is derived from

the graph association prior and after the players interaction, thus allowing to assign the “new” players

detected after the occlusionto the existing ones after the occlusion, fixing the splitting problem, but

the swapping remains as the algorithm is based on PF. Similar approaches introduce the use of graphs

to retain the states of the players so that they are not lost after an occlusionor in boundaries conditions

as when player exits the frame (for broadcast recordings) or exits the field limits.

Data Association Methods

These methods address the object tracking as to find the optimal solution to a data association problem,

where the data to associate is obtained from the object detection step. The main goal of these methods

is to solve the wrong assignment and object track losing due to occlusionor similar players collision,

most of the time used as a complement to a probabilistic/deterministic approach.

The simplest implementation of a data association method is using the Kalman filter due to its

recursive nature, used to update the object-track assignments [75]. It is stated that for multiple-object

tracking, there should be an additional computing to identify each detected object before actually

tracking them, thus the method results are dependent on the right object detection step frame to

frame.

The preferred representation of data for these methods is via a graph, were usually the nodes

represent different states of a single player or team, and edges are weights that connect one state with

the possible next. Soomro, K. et al. [86] implement a graph approach where each team is represented

by a graph, in which the nodes correspond to player positions and the edge weights depend on spatial

inter-player distance. They use color-based tracking to differentiate between teams to assign the player

to one or another graph, and teams formations data to aid into the estimation of the position of the


10

players, with the assumption that players remain in an organized manner, therefore, it is expected

that error increases when players won’t completely follow the initial formation. It also uses only video

clips from broadcast matches with an overhead view, occlusionis dismissed.

Figueroa et al. use the graph representation for player tracking by focusing on the occlusionevents,

where the nodes are used to represent different blobs to distinguish the players, aided with features

as mentioned in above studies, like multi-camera view and color. Moreover, the study adds the use of

morphological features for blob separation (like a pixeled player’s size and shape model); the algorithm

uses a backward and forward graph representation to optimize the graph and aid on multiple cameras

to simplify the graph in occlusionevents. This is done by selecting the best view depending on the

distance of the blob to the center of the field of camera view and which one brings the most information

of isolation of a player, i.e. if in a view the occlusionoccurs, other camera can be used to verify if a

player becomes isolated in that view [25], [26].

Wei-Lwun et al. use the graph representation to keep track of the previous assignments, using

Bipartite Graphs (BPGs) to guarantee that one-to-one assignment is done by graph constrains; the

graphs allow continuity from existing tracks, in contrast with other methods which nature tends to

the creation of new tracks anytime an object is detected (as with particle filtering after occlusionwhen

dealing with MOT). Wei’s implementation considers the players as entities, each containing a set

of features (faces, numbers on uniform, skin or hair color) that compose the Conditional Random

Field (CRF), thus integrating different sources of information to improve the player recognition and

tracking [96]. Siles, F. follows a similar approach by combining a set of features (color, texture, shape

and kinematic model of soccer players) to the edges of a BPG to aid the tracklet-player assignment

[81]. Later, Villalta, M. expanded the graph usage from a BPG to a MPG, the assignment is the

solution of a graph energy minimization problem [92]. Villalta also includes a single panoramic view

of the whole field by stitching the images from two 4K cameras used to record the matches. This

approach eliminated problems related to resolution or temporal segmentation needed for clips from

TV broadcast, however, occlusionproblems remain as the optical capture is from a single point of view,

even though the use of MPGs improves the correct assignment of blobs identities, occlusionis not fully

solved. It is also important to highlight that using two 4K images recording impacts in the amount of

data to process (2x4k cameras of 3840x2160 px each, recording at 30 fps, leading to a load of 995 328

000 pixels per frame), thus, the use of high capacity computers like clusters or parallelization of the

algorithm is needed to speed-up the process [93].


11

The typical implementation of a graph tracker requires to divide the whole video track into several

blocks to process, where the initial state of the next block is defined by the graph solved from the

previous one. There are two considerations on this: first, an error on any assignment (new blobs,

Identity Swap, etc) will spread to the next block, second, the graph solving is a local solution for

the block being contemplated, there exists information that’s ignored on the current block because

it is contained in further blocks, specially in occlusionmatters. Gedikli, S. et al. [30] implement a

probabilistic method for estimating the trajectory of the players adding a MHT algorithm. The MHT

preserves multiple hypotheses and their evolution through a defined temporal sequence (window), at

the end of the window a decision is made with the best fitting hypothesis; Gedikli states that MHT

was able solve some misclassifications result from occlusionevents that the typical tracker could not

solve.

Multiple Hypothesis, initially presented as a solution for noisy and complex object tracking scenes

in 1979 by Reid [1], is formerly widely used in scanner applications [6], where different measures

are received on each scan cycle and spreading of a wrong assignment is critical in application such

as projectile tracking, submarine location and others with a tight error acceptance. Moreover, it is

recently extended to scenes considered complex due to the dynamic interaction between the objects

to be tracked, like pedestrians and sport players tracking, where it outperforms single hypothesis

algorithms [49], [100]. Due to its high computational cost, it is usually used as a complementary aid

for other algorithms, commonly particle filters and/or a graph representation, instead of being used

as the core algorithm of the tracker. Breitenstein’s [14] experiments show that adding a MHT layer

improves the results from a single algorithms implementation, in this case using MHT as an addition

to PF, in particular in soccer with a Multiple Object Tracking Accuracy (MOTA) higher than 80% for

the scenes selected for evaluation, however, identity switches persist due to the use of particle filters

and robustness is low during the initial sequences.

Therefore, most improvements on tracking results due to robustness on occlusion events are found

from a mixture of algorithms, usually by mixing MHT with other algorithms (Kalman filter, PF, ) or

similar approaches like a multi-layer outline, or, with PF based algorithms [9], [58], [101], where MHT

has proven to be comparable to ”stat-of-the-art methods on standard benchmark datasets” even in its

early 90’s implementation [40], [49].


12

Robustness against identity ambiguity

occlusionevent, as stated before with different examples, is one of the biggest problems in optical object

tracking. Of the several ways to fight it, some examples are: improving the current algorithm with

more features, omit occlusionand care only for previous and posterior identities, increase points of

view adding more cameras, and treat the tracking as an optimization problem via multiple layers of

assignment (typically via MHT or graphs plus additional criteria).

Hess & Fern improve the PF with their Discriminatively Trained Particle Filters [32], which is an

alternative for the typical particle filtering approach by training the data set of the players with features

as appearance (size, color), motion (via 11 different probabilistic features) and player interaction (to

prevent particles mixing between different players), all features constantly updated frame to frame.

This method increases the complexity of implementation of the algorithm as it requires to update

the features and train the data set for each player, but shows great robustness when occlusionoccurs,

especially considering that the algorithm was implemented in American Football, sport with more

contact between players than Association Football.

Ok, Seo & Hong attack the particles mixing problem when using PFs by implementing an Occlusion

Alarm Probability (OAP) [60], intended to repel the particles from sticking between players in multi-

target tracking, thus preventing particles mixing and Identity Swap. The results show that there are

some occlusioncases in which the repel leads to unassigned particles to one of the players, after the

occlusionends, the player previously hidden is now considered as a new player (blob) entering the

game due to the missing information from its previous states during the occlusion, this due to not

considering tracking splits scenarios.

Sabirin et al. [74] detect early occlusionevents by defining a rectangular Region of Interest (ROI) to

separate each player, rather than a tight contour, when two or more of these regions are overlapped by a

60% of their area, an occlusion is detected. In general, occluded players are considered a big single ROI

during the occlusion event, their identities are assigned after the event, when there’s enough information

to identify the players during the occlusion, single regions are assigned preserving the shapes from

previous frames, updated just after the event finishes. There’s no complete identity solution during an

occlusionevent, making the algorithm robust to occlusions (the identities are recovered after the event)

but not occlusionsolving. It is important to highlight that Sabirin included players’ texture as a feature

for object identification, alongside other features as position and estimated motion based on optical

flow in the hue channel, to identify different players, showing it as an useful feature allowing easily


13

differentiation from different teams and also being the key on the algorithms robustness to occlusion.

However, the method is dependent on the segmentation results, as many others, thus the importance

of efforts on improving the frames segmentation.

Morais and Xu [55], [98], although working with the indoor sport futsal, both suggest the use of

multiple cameras to improve the players distinction, as different planes will add position information

that a single plane cannot easily obtain, the positions of the players are estimated using homography.

Iwase, S. & Saito, H. implement multiple view images in outdoors scenarios locating cameras around

the game field [36]. Even though the use of multiple cameras has proven to improve the segmentation

results and diminish the occlusionissues (especially when the field is surrounded by cameras and 3D

information is obtained), it is dependent on the optimal different views needed to prevent occlusion,

which can anyway happen for non-rigid 3D objects in motion, e.g. one or more players may fall over a

single player so that it is completely hidden from any camera, or there may be a group of more than

two players colliding, which leads to players ambiguity associations. [42] also mixes information from

multiple camera views into an MHT environment, using 3D information to aid correct identification of

players during an occlusionevent. To prevent occlusionusing multiple cameras, new research questions

may arise like: how many cameras? And, where to place them to cover the most, if not all, the 3D

space? Thus, making the solution hardware dependant. In [49], the previous questions are exemplified

by a variety of camera configurations, many different ways to place the camera and occlusion persists

being a nuisance.

Jiang [38] represents the possible locations of a single object as nodes of a graph, building a

multi-layer graph with the occupation probability of the object, a set of graphs would represent the

multiple objects to track. Shitrit et al. [80], in a similar way, implement a graph representation for

players’ tracklets, reducing the problem to a multi-commodity network flow optimization, using K-

shortest path to solve the assignments on a multi-layer graph. These representations work similar to

a MHT approach, where different tracks are preserved and optimized to find the best later on, instead

of picking local bests on each iteration of the algorithms. The above implementations resemblance a

MPG with MHT approach, differing in the way the graph is constructed, which is considered on the

current work’s implementation as a MMPG.

In [50], it is used an appearance-based model MHT, in which they consider appearance and

motion cues to refine players’ identification. The combination of different features in order to improve

the segmentation, as stated by Manafifard, is a needed modification to data association, as these


14

methods usually start by considering that the nodes to associate (in a graph representation) are the

right measurements. They show that an appearance-based model MHT performs better than a simple

MHT algorithm, giving it the a better discrimination of players and false alarms. This consideration

is taken into account in the current project, where the nodes of the graph are not only a label and a

position, but a set of optical features that weight on the association solution phase.

MHT outperforms those methods using single hypothesis in the case of occlusion and noise [49],

however, it is still not a flawless method, as it depends on the estimates of the measurements within

an occluding blob, where a sudden change on velocity (reality differs highly from estimation) or a

simple bad estimate results in tracking error [39]. Also, it has a high computational cost that makes

it unattractive for real-time applications, nonetheless, it sets an interesting paradigm from which a

multi-layer structure for assignment solving can be implemented [38], [80], which sets a different way

to resolve the typical graph optimization problem. The previous leads to the idea of implementing a

multi-hypothesis/multi-layer structure as a mixture of MHT and MPG, a Multiple Multipartite Graphs

(MMPG) tracker.

Artificial Intelligence

More recently, with the approvals and standardization in the use of performance and tracking tools

in sports like soccer [23], [24] and the raise of Artificial Intelligence (AI), AI powered tools have

been incorporated to aid in player’s performance evaluations via automated optical tracking, from

the generation of ground truth data sets to train, test and validate the tracking solutions [61], to the

integration of AI to track the players real-time from soccer matches recordings, detecting off-sides,

goals and tracing players’ trajectories [2], [4], [66], [67], [78], [85].

However, it is important to highlight these solutions do not rely solely on artificial intelligence, they

follow the thread of combining different tools and methods into a robust and more complete platform,

including but not limited to wearable trackers, installation of multiple cameras mounted underneath

the roof of the stadiums ([78] use 12 dedicated tracking cameras) and human aid to monitor the results

and make corrections as needed, since even with AI errors can occur, as recorded on 2020 when an AI

camera operator confused the head of a bald referee as the soccer ball [3].


15

1.4 Problem Description

Research question

Will a Multiple Multipartite Graphs approach improve precision of soccer players’ tracking

in comparison to the current method based on single Multipartite Graphs?

Problem

Automatic multiple object tracking is not a flawless process, there are different events that make it a

difficult task, in particular, the occlusion events. When performing optical object tracking, occlusion

events cause information losing that leads to wrong or incomplete tracking results. Many algorithms

have been implemented to solve this issue, but they fail one way or another, making it necessary to

add a final layer of human aided correction, thus current tracking platforms are not fully automatic.

1.5 Hypothesis

By implementing a Multiple Multipartite Graphs data association tracking method, it is

possible to improve the occlusion event management of the tracker, which will lead to a

metric value of MOTA greater than 75% on a whole match.

1.6 Objectives

General Objective

Implement a Multiple Multipartite Graphs occlusion solver to manage occlusion events on automatic

soccer players’ tracking from Ultra High Definition video.

Specific Objectives

• Develop the framework needed for the Multiple Multipartite Graphs tracker.

• Design a metric to detect simple occlusion events on soccer matches.

• Implement Multiple Multipartite Graphs occlusion solver.

• Validate the algorithm against annotated ground truth datasets.

• Disseminate research to scientific community.


16

1.7 Scope

The current project focuses on the occlusion events problematic, it departs from the previous ACE

platform version which includes multipartite graphs to perform players’ tracking, using panoramic field

view in ultra high definition (UHD) images from the stitching of two 4k recordings of each half of the

soccer field. The platform will transform from using single multipartite graphs to a Multiple Hypoth-

esis Tracking scheme, self-named Multiple Multipartite Graphs. Some other changes are considered

to improve the platform, according to different specific needs of the project encountered during its

implementation.


17

Chapter 2

Theoretical Background

This chapter covers important concepts and key theory regarding Multiple Object Tracking (MOT),

Occlusion, graph representation and the intuition on Multiple Multipartite Graphs (MMPG), as well

other concepts related to complementary integrations to ACE tracking platform with the goal of

improving the tracking results.

2.1 Multiple Object Tracking - Soccer Players Tracking

To perform soccer players tracking, a single player must be distinguishable from any other player along

the match. This task, as easy as it may seem for the trained eye, is a hard task in object tracking,

taking into account the next considerations: not relevant objects (e.g. goal posts, banners, audience)

must be filtered from the objects of interest (players, ball, referee), each player must be identified in

each single frame, the same player must be identified throughout the video (consecution of frames), the

referee is not a player (can be or not removed from the tracklets depending on the desired analysis),

there are active players substitutions during the match with fresh players, both teams fight for the

same ball leading to occlusion and tons of players interaction, and many others. To cover all of these

and more, object tracking is separated into different steps dedicated to specific tracking tasks.

Three key steps can be recognized: detection of objects of interest, tracking of such objects from

frame to frame, and analysis of object tracks to recognize their behavior. The whole tracking task

can be defined as “the problem of estimating the trajectory of an object in the image plane as it

moves around a scene” [99]. Tracking objects can be messy due to loss of information caused by

projecting the 3D world onto a 2D image (some platforms like SportVU [90] operate directly with 3D

data from multiple views coupling), noise in images, resolution of the video recordings, complex object

motion, scene illumination changes, partial and full object occlusions, and other problems application

dependent; each step has its own difficulties.

The three key steps mentioned above can be recognized in the implementation of the computational


18

platform ACE. The Pattern Recognition and Intelligent Systems Laboratory (PRIS-Lab) develops

a computational platform for soccer digital video analysis, called ACE, platform with the task of

providing automated soccer matches analysis to human motion scientist, coaches, teams and general

public. This platform is implemented in a multi-layer architecture comprised of two main stages:

perception and cognitive [82].

First stage is covered by the temporal and spatial segmentation blocks. The temporal segmenta-

tion performs detection and classification of scenes with players information (applicable to videos of

broadcasted soccer games) to remove those recording scenes that have no information of interest, like

advertising spaces; if the soccer game is recorded in panoramic view of the whole field, this block is

not needed. The spatial segmentation detects and localizes the objects of interest in each frame of the

scene.

The second stage is covered by the trajectory tracking and semantics of the game’s tracked activity.

The trajectory tracking is where the position-object assignment is performed, using as input data the

results from object segmentation. The semantic analysis performs a reinterpretation of the trajectories

to detect actions and classify events, this block is responsible of generating statistics, occupational

map and other information to provide to the customer [81].

Figure 2.1: ACE computational platform architecture layers

Different methods and algorithms are used to try to solve each object tracking step. As the project

focuses on the players trajectory tracking, the current scope prioritizes methods and algorithms directly

related to the trajectory-object association. The inputs of the algorithms to study are the objects

spatially segmented by the previous block in ACE’s flow, i.e. player blobs, block which graphically

identifies them by drawing a contour around the player’s detected shape, the centroid of the contour


19

is used as the estimated position of the player. It is important to highlight the influence of the results

of the segmentation task into the trajectory tracking, any error in the players’ segmentation and

detection will highly lead to a wrongful tracking. Details of the base implementation of ACE platform

are discussed in [92].

To understand the complexity of assigning the positions detected in a set of frames to a player

and identify its trajectory, it is important to know how a typical soccer game is broken down. First,

the game officially starts with all 22 players ”easily” distinguished as they are separated one another

following a game strategy formation, both teams start located in opposite sides of the field. As the

game begins, the players start to move over the field reacting to the game events as the match unfolds,

a player may decide to stay in position, move arms or crouch (shape change), jump, go forward or

backward, left or right, follow a player or the ball; as the ball moves from one side of the field to the

other, the player may change its previous intended trajectory to another; the motion is errant and not

much predictable without the semantics of the game, even though there is a general strategic plan.

If the players would not interact one with another, the trajectory-player assignment may not be as

hard, but, as the game definition states, both teams are fighting for the same ball, this leads to more

than one player sharing a close range to the ball as they struggle to control it, possibly generating

occlusion when one player is placed behind another (from the perspective of the 2D image recorded

from the same view plane); which player is which? Using the colors of the soccer suits is useful only

if the players are of different teams, but if the players are of the same it won’t help much, even less if

it is confused with the background (see Figure 2.2).

Figure 2.2: Players detection issues due to visual properties, scene conditions and segmentation
process. Results from ACE platform, 2018

Automatic object tracking has to deal with previous mentioned issues and many others. Most of


20

these issues correspond to the first stage, specifically from spatial segmentation, as they are dependent

on the image characteristics. A perfect segmentation or feature extraction would simplify the second

stage, where the association algorithms could operate with clean and organized information. However,

as these issues are innate to image processing, data association and object identification have to deal

with messy and flawed detected objects. Even though there are different methods that try to improve

the quality of the detection phase results, some events persist to the other phases, like occlusion events,

resulting in the identity assignment stage consuming data with those problems, causing the previously

mentioned errors like Identity Swap, missing identities and spurious objects.

2.2 Occlusion

When using images, the features extracted depict visual characteristics of the scene under evaluation,

one way or another, they describe each object contained in the scene. However, images are but a

projection of a 3D world into a 2D plane, where these visual characteristics are transformed and

even hidden, making them hard to extract sometimes. Typical features extracted in this context

relay on pixel color (using histograms of color in different representations like YUV, RGB or HSV) to

detect areas of interest in the image (background extraction, possible objects detection) and shape/size

of the objects to filter spurious detections. However, some information is lost in the 3D to 2D world

transformation, sometimes vital information like depth location of the objects, which leads to occlusion

events.

In an occlusion event, objects that share a common axis location, but different depth from the

camera, appear to share the same spatial location, as depth information is lost in the plane of the image.

When this occurs, it is hard to identify which object is which, as figures and colors are mixed and

the common segmentation methods fail to distinguish each object involved; this particular scenario, as

stated in [49], is the biggest challenge in MOT. Figure 2.3 illustrates occlusion events on real soccer

matches, even though two close players are not visually blocked for the human eye, its proximity and

visual features can lead either segmentation or tracking algorithms to give a faulty result (figure 2.3a).

Some other events are more complex and practically impossible to automatically correct (figure 2.3b).


21

(a) Minimum proximity between players is considered as an occlusion
event by the algorithm, which groups both players as a single one.
[74]

(b) Four players occlusion on TV
broadcast video.

Figure 2.3: Examples of scenarios perceived as occlusion events.

The existence of occlusion events, and a segmentation process that is easily affected by it and

many other noisy situations, makes it a need for the identity assignment phase to be robust in order to

deliver a clean tracking output, it must deal with false detections and missing information. However,

such a perfectly robust algorithm does not currently exist, and the current ones which outputs have the

higher accuracy percentages are not fully reliable on its own. Recent approaches tend to mix different

methods to aid the core algorithm into solving assignments, be it hardware solutions like increasing the

points of view, or a mixture of feature models and tools [45], [49]. In terms of occlusion , three ways

to confront the event were identified: no-handling of the event, event detection, and event solving.

No-handling implies that occlusion is not considered as an input event for the identity assignment.

These methods usually rely on the ability of the algorithm to make corrections or its robustness to

false detections. As discussed previously, there is no flawless algorithm, and this approach will lead to

an incomplete tracking history that requires human aided cleaning of the results. Other way of not

handling the occlusion when tracking is trying to prevent it from happening when capturing the scene,

this can be done by increasing the points of view to perform a 3D projection of the field. With depth

information, most occlusion scenarios are dismissed in exchange of having to care about homography

and camera placement. However, even in a 3D projection some occlusion can prove to be challenging

(like in figure 2.3b). More recently, with advancements on technology and its usage approval in official

games [23], [24], tracking is aided, if not even replaced, by alternatives to optical image segmentation,

the use of GPS trackers provides an accurate occlusion-free extraction of the players’ position at any

point of the game with a highest refresh rate than that provided by 30 or 60 FPS videos, simplifying

the tracking challenge to matching the GPS tracked position to coordinate in an image [88].


22

Event detection is the ability to recognize occlusion events, usually indirectly by human interpre-

tation of tracking information, but to decide to dismiss it, i.e. identities are assigned before and after

the frames in which the occlusion occurs, but during the event there is no verification of the identities.

This can be seen in [25], [26], where the missing nodes in the sequential graph hints an occlusion event

(the graph passes from having two identified players to only one during many frames, and then again

having two players, see figure 4.1). However, there is no occlusion handling, the missing information

is not corrected and the tracking output only shows the player missing in some frames, but it is con-

sidered enough to know the players’ position before and after the occlusion , as sometimes the events

are only a few frames long, in the best of cases. However, scenes are not always so simple and there

occurs identity swapping and false-new-player labeling after the occlusion event.

Figure 2.4: Sequential graph where missing identities suggest an occlusion event [26]

Some occlusion detection techniques are used to improve the main algorithm behavior, like in [60]

where occlusions are detected by a proximity alarm in order to disperse the particles from the Particle

Filter (PF) that is used to track players. Another example is [74], that uses occlusion detection to

trigger occlusion handling.

Occlusion handling refers to trying to solve identity assignment during an occlusion event, and/or

to verify and correct identity assignment before and after the occlusion occurs. Some examples are

[25], that builds forward and backward graphs of the frames with missing information in order to

merge data prior and after the occlusion occurs; [74] uses different player attributes (like color and

movement direction) and prior-occlusion Region of Interest (ROI) placement to define the position of

missing players during an occlusion event.

Handling and solving occlusion requires at least three phases: detection of the event (in which

frame it starts and when it ends), revealing hidden players, and identity preservation during the event.

It is important to note that this is a simplification of what it would take to solve occlusion , as there are

different degrees of complexity in which it can occur depending on the number of players involved and

if there are other dynamics like players intermittently occluded while others are permanently occluded.


23

Occlusion event classification and identification

Occlusion events are different and vary in complexity. Simple events as a short sequence of two players

can be solved in most cases, for example with a forward-backward sequential graph [25] or with PF

[32], however, even in these scenarios occlusion solving effectivity is not 100% and there exists many

other scenarios more complex that are yet to be confronted.

An occlusion event O can be identified by its length in video frames, and its complexity. The

length can be represented either by the number of frames or by the initial and final frames ki, kf . The

Occlusion Event Complexity (OEC) is a score that depicts how complex an occlusion event is in terms

of the involved objects’ interaction and the occlusion area objects density, where O1
5,30 is an occlusion

event that occurs from frame k = 5 to k = 30 and has an Occlusion Event Complexity Score (OECS)

CS = 1, where the OECS is the product of the Occlusion Dynamics Index (ODI) and its Occlusion

Agglomeration Size (OAS), to be detailed later.

To study the event’s complexity, it can be categorized by its Dynamics as Steady Occlusion Event

(SOE) or Dynamic Occlusion Event (DOE), and by its OAS, represented by the number of players

involved. From the dynamics perspective, it considers the interactions of the players during the event.

It may happen that during the whole occlusion there were only two players present, both move to

posses the ball, fight for it for some frames, and then one takes possession of the ball and runs faster

than the other bringing an end to the occlusion . The previous example depicts an event in which the

OAS is constant, but it may vary in more complex events.

In the case of an occlusion that starts with two players for a couple of frames and later changes to

three players when a new one gets close to aid its teammate, the OAS varies, and thus the complexity

of the scene increases. In this case, let OCS
5,30 be the occlusion event described previously, where oCS1

5,10

and oCS2
11,30 are the sub-events that comprise OCS

5,30 (event starts with two players and ends with three),

the OAS is useful to break off a complex event into several simpler ones. However, the event can get

even more complex by simply adding intermittency to one of the players.

If we take the previous example and add intermittency, let’s say that the third player enters and

leaves the Occlusion Event Region (OER) several times, the OEC increases as the tracking algorithm

must deal with an intermittent merge/split for several continuous frames, result of the third player

interacting with the previously occluded pair. If there are many intermittent players, the complexity of

the scene increases even more. Thus, we can identify a SOE as one that has no intermittency, with an

Occlusion Dynamics Index Intermittency weight (ODI-I) of 0; and a DOE as one with intermittency,


24

with an ODI-I of the number of intermittent players.

Other factor to consider for the OEC is if the players interacting are from the same team, as similar

color distribution leads to Identity Swap, it is easier to distinguish players from different teams. In

this sense, we define an Occlusion Dynamics Index Team weight (ODI-T) of 1 if there are players from

different teams, an a ODI-T of 2 otherwise. The final ODI is the summatory of the ODI-I and ODI-T,

e.g. an occlusion event with two players of the same team (ODI-T of 2) where one is intermittent

(ODI-I of 1) would have an ODI of DyIx = 3. To calculate the final OECS of this example, we use its

OAS, with a value of AgIx = 1 as there are only two players present, and multiply it by the DyIx,

the OECS of this simple event would be CS = DyIx ∗ AgIx = 3 ∗ 1 = 3; if we take an scenario

with the same OAS but with no intermittency and players from different teams (DyIx = 1) the final

OECS is CS = 1, which is the simplest possible occlusion event (a CS = 0 would mean no occlusion

is happening).

CS = DyIx ∗AgIx = (ODI − T +ODII) ∗ [1 + (x− 2) · 1x 6=2] (2.1)

This project will focus on occlusion events with an OECS CS = 1, 2, which are simple occlusion

events, cases where CS >= 3 will be tested, as those are interesting and challenging scenarios (specially

if the high score is due to intermittency), but no specific result is pursued over it. The most common

OECS are yet to be obtained from scene labeling on real matches, to be shown in further results.

2.3 The Graph Representation

A graph G is a mathematical structure that shows relations between the objects it comprises, it mainly

consists of two things, a set V = V (G) whose elements are called vertices, points, or nodes of G, and

a set E = E(G) of unordered pairs of distinct vertices called edges of G, such a graph is denoted by

G(V,E). If there is an edge e = {u, v} (the edge e connects vertices u and v), then vertices u and v

are adjacent and are the endpoints of e [73].

Graphs can be pictured by diagrams where each vertex v in V is represented by a dot or a small

circle, and each edge e = {v1, v2} is represented by a curve which connects its endpoints v1 and v2 (see

figure 2.5 for a generic example of a graph).


25

v1

v2

v3

v4

v5

e1

e2

e3

e4

e5

e6e7

e8

e9

v1

v2

v3

v4

v5

5

8

10

9

3

124

1

9

Figure 2.5: Graph G pictured by its vertices and edges relations (Left), example of a weighted graph
(Right).

If graph G is assigned data to its edges and/or vertices, it is called a labeled graph, in particular,

G is called a weighted graph if each edge e is assigned a number w(e) called the weight or length of e,

and thus an edge-weighted graph (see figure 2.5).

A complete graph is a graph G in which every vertex is connected to every other vertex in G (see

figure 2.6). A graph is directed if it is made up of a set of vertices connected by edges, where the edges

have a direction associated with them.

v1

v2

v3

v4

e1

e2

e3

e4
e5

e6

Figure 2.6: Complete graph example

A multipartite or k-partite graph G is one in which its vertices V can be partitioned into k different

independent subsets such that vertices of the same subset are not adjacent. If the amount of subsets


26

k = 2, it is said to be a bipartite graph, if k ≥ 3, it is said to be a k-partite graph. It is a complete

k-partite graph if every pair of graph vertices in the k sets are adjacent, if the subsets have p, q, ..., r

vertices respectively, the complete k-partite graph is denoted Kp,q,...,r (see figure 2.7).

u1

u2

u3

u4

u5

v1

v2

v3

v4

U

V

Figure 2.7: Bipartite Graph Example

In the case of ACE platform, a series of k frames are represented by a directed edge-weighted

k-partite graph mpg, where the vertices (nodes) of each subset k represent a set of characteristics for

each object of interest (players, ball and referee), and the weighted edges represent a cost of making

the nodes adjacent (i.e. the weight of node ui from subset k-1 representing the same object as node vj

from subset k). The mpg is initiated as a complete graph, and is then minimized by a shortest-path

algorithm to obtain a graph where ideally each node has exactly one adjacency (see figure 2.8). If

there were no faults when tracking the players, each subset k would have the same exact number of

nodes (not considering player expulsions or replacements) and the ideal minimized graph would be the

common output, however, that is not the case in a real scenario.


27

u1

u2

u3

u4

v1

v2

v3

v4

U V

u1

u2

u3

u4

v1

v2

v3

v4

U V

Figure 2.8: Bipartite graph example with 4 nodes in each subset. Left: Complete Bipartite Graph.
Right: Minimized Ideal Bipartite Graph.

2.4 Multiple Multipartite Graphs

Object tracking involves detecting and identifying the object of interest in an image k, and preserving

its identity in the next images {k + 1, ..., k + i} from a sequence of consecutive images (detecting

and identifying the same object as itself in each image). Different features are extracted to estimate

the object’s position in the image, in this sense, the identity assignation and position estimation are

considered to be an assignment hypothesis. Different considerations are taken to determine if that

hypothesis actually represents the desired object, in a probabilistic tracking approach, each possible

assignment with a high probability is a hypothesis (for example each particle in a PF[22]), where the

tracking result can be a single hypothesis with the highest probability, the average of the hypotheses

with highest probability, or a region of high probability. In a data association tracking method, in

particular a Multipartite Graph (MPG) approach, each node adjacency combination is a hypothesis

(see figure 2.9), it can be said that a complete MPG is one that represents the whole hypothesis space.


28

u1

u2

u3

v1

v2

v3

w1

w2

w3

U V V

Hypothesis 1

u1

u2

u3

v1

v2

v3

w1

w2

w3

U V V

Hypothesis 2

u1

u2

u3

v1

v2

v3

w1

w2

w3

U V V

Hypothesis 3

Figure 2.9: 3 assignment hypothesis examples for a multipartite graph of 3 subsets with 3 nodes
each.

When considering the features extracted and other constraints (like distance and speed constraints)

many of those hypotheses can be easily discarded, as they are not possible due to known physical

interactions (e.g. a player cannot be on one side of the field in frame k and on the other side of the filed

in frame k+1, if consecutive frames represent a lapse less than a second). For the plausible hypotheses,

usually a shortest-path algorithm is used to determine the best fit minimized graph according to the

different weights assigned to each edge (from features and constraints). However, the usual approach

is to build a MPG for a fixed window size of K frames, and pick the very best assignment hypothesis

and discard the rest, which would be the final solution if we suppose that the data from which the

MPG was built is accurate with no noisy, spurious or missing identities, what happens in real case

scenarios, thus the local best hypothesis may not be the global best, or even the best for a MPG that


29

considers the frames from the same window and the next one.

One way to illustrate it is by comparing the hypothesis decision by shortest-path to what some-

times occurs with a GPS map navigator. It can occur that the driver is suggested to take an exit to go

through a path which is just 1 second faster than the current course, the driver may decide not to do so

considering that 1 second is not gain enough and that taking the exit would lead to a known hill that

the tracker dismisses as it lacks topography information, leading to a slow down and therefore lasting

even more than estimated, the algorithm is designed to pick the shortest path from the currently

available data. It can be argued that incorporating a way to calculate the estimated time of arrival

(ETA) considering the hill slow down would improve ETA’s accuracy, let’s now consider the case in

which the navigator does not know beforehand every route from a map, but it generates the best route

evaluating a portion of the map by fixed size windows as the car advances, then, until the window with

the portion of the map that includes the hill is processed the algorithm won’t take into account its

existence, it may still offer the driver to continue to the hill, but it is only after the algorithm detects

the hill that it can conclude that the current path does not lead to the global optimum. Similarly

when tracking football players (specially if the final goal is to track live players), when the current

window of frames is being processed, the algorithm does not know how the batch relates to the next,

if there is loss of information in the next batch or if the current one is missing a player that is detected

in the next; the shortest-path approach picks the local best hypothesis, which may not coincide with

a global best hypothesis.

The above scenario may result in cuts in motion flow, for example a motion scene being composed

of two MPG due to window sizing and sequential graphs creation instead of a single MPG, an occlusion

event could be split into several MPG with its own faulty local optimum that makes no sense once put

together; in the end, the final graph representing the whole match is a chain of local bests hypothesis

added sequentially, which may not have the same results as a MPG made from the entirety of the

match’s frames, more study is required to understand the extent of the effect of different window sizes.

As of now, the tracking phase is fed an optimum initial state that may or may not include errors and

assume it as the truth so as to link each sequential batch (hence the importance of a clean segmentation

and detection before the tracking phase to prevent spreading errors the tracking algorithms will take

as truths). Figures 2.10 and 2.11 compares the general processing of the MPGs of a scene comprised

of 6 frames with a window size 3 and 6, respectively.


30

u1

u2

u3

v1

v2

v3

w1

w2

w3

x1

x2

x3

y1

y2

y3

z1

z2

z3

U V W X Y Z

MPG frames
1-3

u1

u2

u3

v1

v2

v3

w1

w2

w3

x1

x2

x3

y1

y2

y3

z1

z2

z3

U V W X Y Z

Reducing
MPG frames
1-3

w1

w2

w3

x1

x2

x3

y1

y2

y3

z1

z2

z3

W X Y Z

MPG frames
4-6

w1

w2

w3

x1

x2

x3

y1

y2

y3

z1

z2

z3

W X Y Z

Reducing
MPG frames
4-6

Figure 2.10: Multipartite graph processing with a window size 3 of a scene comprised of 6 frames.
Subsets U, V and W represent the first 3 frames of the scene, as the window size is 3, these subsets
comprise the first MPG, it is minimized and the final subset W is the starting point for the minimization
of the next MPG for the last 3 frames of the scene.


31

u1

u2

u3

v1

v2

v3

w1

w2

w3

x1

x2

x3

y1

y2

y3

z1

z2

z3

U V W X Y Z

u1

u2

u3

v1

v2

v3

w1

w2

w3

x1

x2

x3

y1

y2

y3

z1

z2

z3

U V W X Y Z

Figure 2.11: Multipartite graph processing with a window size 6 of a scene comprised of 6 frames.
Subsets U, V, W, X, Y and Z represent the whole scene in a single MPG, which is then minimized
with the whole scene’s information.

Considering the local optimum as an assignment hypothesis (one possible assignment solution

based on the available data), let’s suppose the tracking can be improved by holding the assignment,

or hypothesis selection, until there is more information for an actual best pick (we have knowledge

that there is a hill coming, so let’s be flexible with the MPGs optimization). Keeping a set of best

local optimums instead of only the best local optimum grants a set of alternative assignments the

tracker can piece together into building a wider optimum without committing to the initial assignment

based on limited scene information. The idea of propagating multiple assignments hypothesis was first

presented in 1974 by Singer, Sea and Housewright [83] to be used in dense multitarget environments,

and later expanded in 1979 by Reid [1], presented as Multiple Hypothesis Tracking (MHT) algorithm, a

systematic approach to solve tracking of multiple objects in a complex scenarios. The algorithm mainly

consists on creating assignation hypothesis and propagating them as new measurements and detections

are incorporated into the assignment space, usually depicted as a decision tree that encompasses the

hypothesis space (figure 2.15). Those discarded hypothesis (branches) are pruned from the hypothesis

tree, each new measure adds new information and thus creates a new subset of hypotheses (hypothesis

branches).


32

Figure 2.12: Hypothesis Tree: tree representation of formed hypotheses. [6]

There exist different implementations of a MHT algorithm [1], [6], [8], [13], [18], [29], as well as

variations depending on the area of research, improvements or the algorithm it is mixed with [14], [30],

[42], [50], [58], [101], but two key aspect remains, the first one is its high complexity and computational

cost due to the hypothesis creation process (hypothesis can become virtually infinite if not limited) that

makes it a requirement to use a robust pruning method and to limit its length with a fixed maximum

size. This issue is similar to the one previously discussed with MPG, however, due to its complexity

and cost, it is not desired to use MHT for the entirety of the match but rather in sections where it

may come handy, like in occlusion events (here it is paramount the idea of occlusion events detection

and classification).

The second aspect regarding MHT is its tree-like shape, comparing the main structure of a Hy-

pothesis Oriented hypothesis tree [6] with an MPG (figures 2.9 and 2.11), where each branch of the

tree is a hypothesis and each possible assignment combination in a MPG is a hypothesis (taking each

subset as a single node of a hypothesis branch), the possible MPGs can be arranged each as a branch

of a hypothesis tree, thus making the whole hypothesis tree into an MMPG space (figure 2.13), where

hypothesis pruning would mean to discard an MPG, and hypothesis propagation the incorporation of

new subsets, i.e. expanding each hypothesis/MPG with new measurements from the following video

frames. Each new subsequent MPG window adds a new set of hypothesis branches to the existent ones,

making the hypothesis tree grow.


33

u1

u2

u3

v1

v2

v3

w1

w2

w3

U1 V 1 W1

Hypothesis 1

u1

u2

u3

v1

v2

v3

w1

w2

w3

U2 V 2 W2

Hypothesis 2

u1

u2

u3

v1

v2

v3

w1

w2

w3

U3 V 3 W3

Hypothesis 3

——————————————————————————————————————————-

u1

u2

u3

v1

v2

v3

w1

w2

w3

U

V 1 W1

Hypothesis 2

v1

v2

v3

w1

w2

w3

V 2 W2

Hypothesis 1

v1

v2

v3

w1

w2

w3

V 3 W3

Hypothesis 3


34

——————————————————————————————————————————-

U

V3 W3

Hypothesis 2 V2 W2

Hypothesis 1

V1 W1Hypothesis 3

Figure 2.13: Depiction of Multiple Multipartite Graphs structure based on a Hypothesis Tree, where
the first common subset is the starting point, each branch is a hypothesis composed of a possible
reduced MPG

Following the previous intuition, an MHT-like structure can be obtained by an MPG of MPGs,

where each MHT represents a hypothesis branch.

2.5 Player Identification

An essential aspect of a correct object tracking is the ability to retain the unique identity of each

object throughout the video. In optical object tracking, characteristics extracted from the video can

be used as a first layer of attributes that distinguishes one object from another; properties derived from

the first set of attributes, as speed or trajectory direction, can be used as complementary attributes.

Moreover, building a set of attributes can be extended to groups of objects to determine belonging to

a certain group (spurious blobs vs player blobs, referees vs players, team A vs team B, etc).

Player Model

From a 2D image object identification standpoint, an object of interest is but a cluster of pixels that

standout from the rest of pixels in the image. From these pixels some attributes can be extracted to

build the object’s characteristics that make it a unique object. Directly from each pixel we can extract

the color information in different formats, hue, saturation, its position relative to the global frame of

reference; from the cluster of pixels we can calculate histograms of visual properties, centroid, contour,

area, perimeter, aspect ratio, and more. This first set of attributes becomes useful when identifying

objects in the same image scene, as the slightest variance can be used to distinguish between similar

objects. These attributes can also be used to filter between spurious identifications and actual objects


35

of interest, e.g. using aspect ratio to remove small objects detected from the scene’s background noise,

or using the HSV histogram to filter false positive clusters of pixels that belong to the soccer field

markings and not to a player.

After a proper tracking of an object across multiple sequential images, some other attributes can

be estimated from the historical data, from the record of previous positions the speed and direction of

the object can be estimated, which can be used to predict the object’s next position.

Team Model

Similarly to an individual player, a group of players can also be identified by a set of attributes. In the

case of soccer, the game is designed to have the players belonging to one of two teams, where the players

of the same team wear uniforms of visual similarity (color, shape, patterns) with the exception of the

goal keepers. Sometimes a spurious blob can pass a filter designed from considering individual objects,

i.e. a player is expected to have some aspect ratio or a size within some margins, but spurious blobs

can satisfy these conditions with unexpected shapes and produce false positives. Using the attributes

derived from the common properties of a group of known players of the same team can be used as

an additional filter where a blob is a player blob only if it fits the expected pattern of one of the two

teams.

This useful team model, however, depends on the correct identification of a player, which can lead

to a chicken and egg situation. In the case of knowing beforehand the initial position of the players,

this piece of information can be used to create a trusty team model, but a bit of caution to keep in mind

is that as the match advances and the conditions change (players’ being replaced, the uniforms getting

dirty changing the color distribution, abrupt scene light changes) this model can become obsolete, thus

appropriate tools to keeping this model valid are required to make the most out of it.


36

(a) Yellow player with its hue channel his-
togram

(b) Blue player with its hue chan-
nel histogram

Figure 2.14: Comparing players from different teams

Player similarity

Having determined the model of properties that describes a player, its identity is spread to the next

frame by comparing its set of attributes with the objects of the next frame that might represent the

same player. If the objects seem closely related, the identity is spread, otherwise no connection is

established between the objects.

Figure 2.15: Using similarity between blob models to perform identity assignment across sequential
frames

Each attribute is not treated equally, some have a greater impact in determining if the two objects

represent the same player, as is the case of the position, taking into account the physical limitations of

the players and the rate of the video recording (for example 30 FPS, two consecutive frames represent

1/30 of a second) it is impossible for one player to be positioned in one corner of the field in one

frame and in the opposite corner in the next frame, the huge difference in the object’s position rules


37

out the possibility of them representing the same player. In the other hand, other attributes are less

influential when determining if the objects represent the same player, as shape or the contained area,

as the players are expected to be humans running and kicking the ball with two legs. Not to say these

traits are not useful or cannot be used to distinguish players, but differences between detected objects

are expected to be low when considering this kind attributes.

Each attribute has its strengths and weaknesses when used for identifying objects (position helps

filtering distanced players but becomes a lesser indicator when comparing objects close together),

it becomes beneficial to use as many traits as available to cover a wider range of situations. Since

some attributes can be used more generally than others, these are integrated into comparing object’s

similarity with a weighted value that can be adjusted to determine the influence of each attribute.

Thus, we can define the dissimilarity between two objects A and B as the weighted difference

between the object attributes, where the lesser the value the more the similarity.

D(A,B) =

n∑
i=1

wi∆(Ai, Bi) (2.2)

Where D(A,B) is the total dissimilarity between objects A and B, n is the number of attributes,

wi is the weight associated with the i-th attribute pair (Ai, Bi), and ∆(Ai, Bi) is the delta (difference)

between the i-th attribute of object A and object B.

Next Position Prediction

When tracking projectiles or objects expected not to change much its trajectory [58], [77], a clear

mathematical model that describes the object’s trajectory is quite useful, whenever the optical tracking

fails, the tracker can rely on the model to predict the object’s position in subsequent frames until it

is detected again by the tracker. The trajectory model can also be used to trace back in time the

objects motion to estimate the trajectory in a time frame not present in the current data set. When

tracking objects with more complex motions or movement freedom, these models can become complex

to determine. In the case of human motion, in spite of having a general sense of where the person

might go based on an objective (the players follow the ball from one side of the field to another),

unknown factors can unexpectedly alter the behavior of the person, and thus its motion (the player

tripped over, slowed down due to fatigue, got distracted by a flying bird, does not know where the ball

is at and so it is not following it, it is positioning itself in advance of a future play).


38

PF comes handy in situations like these [32], [58], [65], where instead of predicting the players’

next position, a cloud of points represent the space of possible next positions, the player is then

predicted to be where most of these points cluster. This capability is not currently incorporated into

ACE platform, and integrating it is a whole project on itself.

Assuming some constraints in the motion freedom of a soccer player, a simpler approach can be

used. Considering the input video has a rate of 30 FPS, where the time change between two consecutive

frames is 1/30 of a second, taking into account the distance from which the camera is set from the

field to record the match, the pixel velocity can be calculated to describe the motion of players in the

scene, which can be used to expose how much change are we to expect in between frames for a player

in movement [33].

Keeping things simple, we can assume a rectilinear motion for immediate frames k and k+1, by

knowing the player’s direction and speed from the history of positions the immediate next position can

be estimated. How well the prediction fits depends on how well we can estimate the player’s speed and

direction, which depends on the length of the previous positions history, too short a history makes the

prediction too sensitive to new position information (direction may change on every new prediction)

and too long a history will include outdated position information that may prevent the direction to

be updated (the player moved to the right of the field for too long and then decides to turn the other

side, the history of ’right direction’ weight more in predicting the new direction than the new ’left

direction’).

Object Proximity

As stated before, distance between the position of objects is a useful metric that can be used to rule out

relationships between objects that are too far apart. In the case of occlusion probability, the closeness

between two objects can be used as an early indicator of an occlusion event. However, the distance

between two objects is usually measured between the objects’ centroids, but the occlusion can occur

with any part of the region each object holds, a bigger object can occlude a smaller object even when

the euclidean distance of the centroids is great. Figure 2.16 illustrates this problem by comparing

two set of objects with the same euclidean distance from their centroids, top shows the bigger object

partially occluding the smaller one, bottom shows no interaction between objects of same size.


39

Figure 2.16: Problem with occlusion and euclidean distance as proximity metric

To address the previous scenario, the proximity between two objects is defined taking into account

not only the (euclidean) distance between the objects, but also the size ratio between the objects

(width and height ratios), including weighted values to provide flexibility to the calculation on how

each attribute influences the proximity.

proximity = distance× (wdistance + (wsizeWidth × sizeRatioWidth + wsizeHeight × sizeRatioHeight))

(2.3)

Where:

• distance is the Euclidean distance between the centers of the two objects, with (xA, yA) and

(xB , yB) the positions of objects A and B, respectively