Full article: EIoU-distance loss: an automated team-wise player detection and tracking with jersey colour recognition in soccer

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The surge in demand for advanced operations in sports video analysis has underscored the crucial role of multiple object tracking. This study addresses the escalating need for efficient and accurate player and referee identification in sports video analysis. The challenge of identity switching among players, especially those with similar appearances, complicates multi-player tracking. Existing algorithms relying on manually labeled data face limitations, particularly with changes in jersey colors. This paper introduces an automated algorithm employing Intersection over Union (IoU) loss and Euclidean Distance (EUD), termed EIoU-Distance Loss, to track players and referees. The method prioritizes identity coherence, aiming to mitigate challenges associated with player and referee recognition. Comprising BackgroundSubtractionMOG2 for player and referee detection and IoU with EUD for connecting nodes across frames, the proposed approach enhances tracking performance, ensuring a clear distinction between different identities. This innovative method addresses critical issues in sports video analysis, offering a robust solution for tracking players and referees in dynamic game scenarios.

KEYWORDS:

1. Introduction

In soccer videos, players and the referees are the most prominent subjects. Player and referee identification and tracking are crucial for acquiring a complete insight of a sports event and may be thought of as a multi-object detection, classification, and tracking task, with the objective of determining the position of every player at all times and reconstructing trajectories from multi-view video streams.

Recognising and distinguishing players in real sports videos presents a formidable challenge, primarily due to the players’ propensity to confuse opponents with abrupt changes in direction and speed. When compared to general person tracking scenarios, commonly used re-identification attributes like colour and gait prove less reliable in player tracking. Addressing player tracking with distinct identities involves tackling the following challenges: (I) Players often share a similar appearance due to wearing matching jerseys. Factors such as changes in body shape, unexpected movements, interference from spectators, and fluctuations in lighting further complicate the accurate tracking of players with unique identities. (II) The movement of players results in significant deformation of jersey numbers, particularly when they are at a distance from the cameras. This makes it challenging to discern and track the jersey numbers effectively. (III) Players frequently switch positions as a result of their unpredictable movements on the field. When player identification is inaccurate, it can lead to instances of mis-tracking.

Vision plays a pivotal role in achieving real-time object recognition and tracking in the context of soccer. Recent years have witnessed significant advancements in vision systems, boasting the capability to swiftly and accurately detect objects. The object detection and tracking methodology introduced here not only finds utility in real-time performance assessment systems but also in the domain of humanoid robotics. Performance measurement systems designed for both teams and individual players can unveil aspects of the game that elude the human eye. They enable the measurement of metrics such as player distance covered and relative positioning in relation to teammates and opponents. This valuable information serves multiple purposes, including the evaluation of individual player performance, assessment of fatigue levels, analysis of team tactics, and a thorough examination of adversaries’ strategies. To dissect the tactics employed by opposing players, an object detection method capable of identifying and categorising teammates is indispensable. This necessitates the extraction of the foreground, achieved by eliminating the background from the frames, a task facilitated by the built-in functions of OpenCV.

Research in this field currently grapples with limitations in its ability to effectively address pivotal challenges, including team player classification, referee detection and tracking, and identity switching. Consequently, there is a pressing need for the development of a novel system that employs deep representation to learn players’ identities and enhances the tracking mechanism by incorporating identity-related information. Furthermore, most contemporary methods for multiple object tracking rely on manually labelled bounding box data as input. Despite this, these state-of-the-art detectors and trackers often fall short when faced with the identical appearances of teammates, resulting in suboptimal player tracking outcomes. This gap underscores the lack of a comprehensive and reliable solution that seamlessly spans the entire spectrum of multi-object detection, identification, and tracking, a void that persists both in academic research and within the industry.

In summary, this study introduces a dependable method for recognising and tracking multiple players and referees in real-world scenarios. The primary contributions of this study include:

The introduction of an unsupervised player and referee detection and classification framework, eliminating the need for manual labelling. This framework detects players and referees and categorises them based on their respective teams.
The development of an efficient algorithm to tackle identity switching and enable the tracking of players and referees on a team-by-team basis, achieved through the utilisation of IoU and EUD methods.

This article clearly gives pertinent information as to what is being done in the field of player/referee detection and tracking in sports videos, with an emphasis on soccer video. The structure of this article is organised as follows: Section II provides a review of the existing literature related to detection and tracking. After that, Section III presents the proposed methodology of players/referee detection and tracking approaches. Section IV discusses results, experimental implementations. Finally, conclusion and scope for future research are provided in Section V.

2. Related work

In this section, we take a look at some of the most common architectures for object detection as well as some of the latest sports analysis techniques. Murthy et al. (Citation2020) discussed deep learning-based supervised object detection mechanisms. A review article presented by Naik et al. (Citation2022a) explored most of the aspects of academic research in sports activities.

2.1. Object detection

Faster R-CNN proposed by Ren et al. (Citation2015) is a state-of-the-art object detection methodology that achieves remarkably promising performance in prominent datasets like COCO using a two-stage object detection approach. However, the computational cost is still considerable, necessitating the use of a GPU to run R-CNN better and faster. As a result, single shot detectors like YOLO family developed by Redmon et al. (Citation2016, Citation2017, Citation2018) and in 2016, Liu et al. introduced the Single Shot Multibox Detector (SSD), which demonstrated enhanced speed and improved accuracy, as further evidenced in the findings of Huang et al. (Citation2017). YOLO was a single convolutional neural network that recognised objects by predicting bounding boxes and class probabilities. However, detecting small objects in the crowed scene suggested by Singh et al. (Citation2021) of a sport and achieving exact localisation for activity recognition as developed by Guan et al. (Citation2011) remained a challenge. After that, SSD Liu et al. (Citation2016) was offered as a way to improve YOLO approach. Due to the use of multi-scale feature maps and predefined boxes mechanism, SSD can identify tiny objects and improve localisation accuracy compared to YOLO. YOLO, on the other hand, was adapted to address its issue. The recent version, YOLO v4, used a more sophisticated deep architecture with additional layers and residual blocks to make detections at three different scales. Hu and Ramanan (Citation2017) used context information such as hair and shoulder to distinguish little faces, while context information for balls or players was significantly poorer. R Zhang et al. (Citation2022) introduced a method for converting features extracted by a pre-trained self-supervised feature extractor into a Gaussian-like distribution. In their work, B. Ding et al. (Citation2023) introduced an unsupervised transmission-aware dehazing module designed to enhance visibility and mitigate depth-dependent noise propagation within the dehazing process. D. Yuan et al. (Citation2022a, Citation2022b) have devised an Aligned Spatial–Temporal Memory network-based Tracking approach specifically tailored for the task of tracking Thermal Infrared targets and an adaptive spatial–temporal context-aware model within the DCF-based tracking framework which aimed at enhancing tracking precision and minimising the impact of boundary effects. These studies showed that expanding small objects are effective enough while requiring fewer processing resources.

2.2. Player detection

Vision-based player/ball detection proposed by Burić et al. (Citation2018) is essential in sports applications for analysing ball trajectories developed by Theagarajan et al. (Citation2018), Wei et al. (Citation2015), player movements, player identification, and activity recognition developed by Miyata et al. (Citation2017), Rematas et al. (Citation2018) and Senocak et al. (Citation2018). Real-time processing, a fundamental requirement for applications such as live broadcasting, as suggested by Wang et al. (Citation2018) in their proposal and Augmented Reality or Virtual Reality, as explored by Lee et al. (Citation2017) in their study, is a problem with 2 K Sports Videos. Reducing the size of 2 K videos to smaller dimensions was found to be effective, as demonstrated by the works of Wang et al. (Citation2018) and Tijtgat et al. (Citation2017). Another possibility was parallel computation, which necessitated more expensive equipment. This problem is exacerbated by the fact that multi-view 2 K sports videos have multiple views. Currently, there is limited research dedicated to object detection in multi-view 2 K sports videos, as exemplified by Shih’s work in 2017. Detecting soccer players are therefore challenging, especially when there are multiple entities involved. There are several methods for detecting players; Santhosh and Kaarthick (Citation2019) investigated the use of a HOG colour-based detector that gives each image a player detection. For player detection, Ramanathan et al. (Citation2016) used a CNN-based multibox detector. Johnson (Citation2020) experimented with a more complex model, AlphaPose, an open-source multiperson pose estimator based on COCO dataset.

2.3. Player tracking

For player tracking across a game, Ramanathan et al. (Citation2016) used a Kanade–Lucas–Tomasi feature tracker. Reidentification models developed recently take it a step further: In Arbués-Sangüesa et al. (Citation2019) study, a multi-scale detection method was introduced to track players. This method involves the extraction of both geometric and content features, leading to the development of a multi-scale tracking system that operates with full-HD frames at a resolution of 1920 × 1080 pixels. Zheng et al. (Citation2019) employed an embedding model to extract player attributes and a combination of IoU and embedding similarities. Zhang et al. (Citation2020) introduced the k-shortest paths tracking technique to boost player tracking efficiency. However, it may be sensitive to dynamic field changes, impacting performance when players deviate from predefined paths. Liang et al. (Citation2020) utilised similar schema and an embedding model to extract the numbers and colours of the players. Kamble et al. (Citation2019) presented deep learning-based ball identification and tracking (DLBT) method for soccer videos. Based on IoU, this technique tracked players and the ball with a high precision of approximately 87.45 percent. Zikai Song et al. (Citation2021) introduced a distractor-aware player tracking algorithm and a high-quality benchmark for soccer play tracking. Hurault et al. (Citation2020) developed deep learning-based soccer player detection and tracking system that is resilient to small players in a range of circumstances. Kim (Citation2019) suggested topographic surface technique to extract foreground region for consistently tracking multiple players in varying soccer match scenarios. Naik et al. (Citation2022a, Citation2022b, Citation2023) employed supervised methodologies, including YOLOv3 and YOLOv4, to detect players, the soccer ball, and referees. Zheng (Citation2022) introduced a target-tracking method for soccer using Hog feature-based Kernelized Correlation Filters in combination with deep CNNs. Kim et al. (Citation2018) proposed an adaptive multiscale sampling approach for multiplayer tracking in soccer videos. A simple object detection algorithm detects objects as persons in the playfield but does not classify the players and the referees. Moreover, training is must for those object detection methods. In this paper, a novel approach is proposed to detect and track in which foreground was extracted from the background and followed by contour analysis is performed to detect objects and jersey colour is applied to classify players team-wise and the referee. To track the players and referee, IoU and Euclidian Distance based framework is developed. In this paper, we have leveraged OpenCV methods for the detection, and we have considered jersey colour for the automatic classification of team players and the referees. Furthermore, we have utilised handcrafted approaches to track these individuals with distinct identities.

3. Proposed methodology

To identify players and the referees in playfield, background subtraction methodology was used. BackgroundSubtractorMOG2 algorithm is the process of separating the foreground objects (moving objects) from the background (static or stationary objects) in a video or image sequence. The “MOG2” part of the name stands for “Mixture of Gaussians with Adaptive Learning”. It uses a statistical model to represent the background and foreground of an image or video sequence. Specifically, it models each pixel as a mixture of several Gaussian distributions. This allows it to adapt to changes in lighting conditions, making it suitable for dynamic environments. The algorithm continuously adapts and updates its model of the background over time. It dynamically adjusts the parameters of the Gaussian distributions based on the observed pixel values. This adaptability is essential for handling gradual changes in the scene, such as lighting changes and camera motion. Typically, a thresholding operation is applied to the likelihood of a pixel belonging to the background or foreground. Pixels with a low likelihood of belonging to the background are considered part of the foreground.

BackgroundSubtractorMOG2 has several parameters that can be fine-tuned to achieve the desired results. These parameters include the number of Gaussian components, learning rate, and the history of frames used for modelling. Proper parameter tuning is crucial to adapt the algorithm to different scenarios and achieve accurate background subtraction. The output of BackgroundSubtractorMOG2 is typically a binary mask, where each pixel is labelled as either foreground (object of interest) or background. This mask is then be used for object tracking, motion analysis. BackgroundSubtractorMOG, BackgroundSubtractorMOG2, and BackgroundSubtractorGMG are some of the built-in OpenCV functions. After detection and classification of players and referees, EUD + IoU based tracker was used to track players and referees. The block diagram of the proposed methodology to detect and track players and referees is shown in Figure and the complete process is discussed in the following subsections.

Figure 1. Block diagram of proposed methodology for players and referee detection and tracking.

3.1. Team-wise player/referee detection and classification

In proposed research methodology, the determination of primary colours assigned to players and referees, corresponding to the colours of their respective bounding boxes, is achieved by combining equal HSV colours. To enhance adaptability across a range of sports scenes, we incorporate the BackgroundSubtractorMOG2 – an advanced background subtraction model leveraging a Gaussian mixture model – into each frame of the video sequence. In this intricate process, the initial frame serves as the background model, utilising subsequent frames to extract foreground pixels by discerning them from the modelled background. The resulting output from BackgroundSubtractorMOG2 undergoes thresholding, followed by the application of a kernel-based image smoothing technique to eliminate shadows. Subsequent morphological operations are applied to each frame to eliminate unwanted clutter and close small gaps between foreground objects. Players and referees are then categorised based on the colours extracted from foreground pixels, enabling the detection of any object in a frame using contour features. The outlines of entities are generated by connecting continuous bounding points with lines, facilitating shape analysis and object recognition. OpenCV offers various contour functions for contour detection, along with additional contour-related parameters such as area, centroid, and bounding box extraction. Following the application of these image processing techniques, structural analysis is conducted to gather information on player contours. These contour details are instrumental in determining the centroids and coordinates of the objects’ bounding boxes. The algorithm for classifying players and referees based on their jersey colours is visually depicted in algorithm-1 (Figure ), providing a clear representation of our approach to object recognition and tracking in dynamic sports scenarios.

Figure 2. Proposed algorithm to classify team wise players and referee.

The approach employs IoU and EUD techniques for tracking both players and the referee. It achieves this by estimating the new positions of objects in the current frame, approximating the geometry of each detected object's bounding box, and using this information to match detections with existing objects.

3.2. Intersection over union based tracking with unique identity

The area of intersection (IoUArea) and IoU is calculated with respect to the centroid of the bounding box in each frame to the detected (bboxA) and predicted (bboxB) bounding boxes as shown in Figure to detect and assign a unique identity to each player in consecutive frames which is shown in Figure (a). The identity is assigned to detected objects based on the threshold values of (IoUArea) and IoU. A threshold is specified for IoUArea and IoU to exclude any small elements detected. Therefore, the calculation of IoUArea and IoU is performed by the following equations. (1) $\begin{aligned} X_{A} & = max ((x_{A} - \frac{w_{A}}{2}), (x_{B} - \frac{w_{B}}{2})) \equiv (x_{B} - \frac{w_{B}}{2}) \end{aligned}$ (1) (2) $\begin{aligned} Y_{A} & = max ((y_{A} - \frac{h_{A}}{2}), (y_{B} - \frac{h_{B}}{2})) \equiv (y_{A} - \frac{h_{A}}{2}) \end{aligned}$ (2) (3) $\begin{aligned} X_{B} & = min ((x_{A} + \frac{w_{A}}{2}), (x_{B} + \frac{w_{B}}{2})) \equiv (x_{A} + \frac{w_{A}}{2}) \end{aligned}$ (3) (4) $\begin{aligned} Y_{B} & = min ((y_{A} + \frac{h_{A}}{2}), (y_{B} + \frac{h_{B}}{2})) \equiv (y_{B} + \frac{h_{B}}{2}) \end{aligned}$ (4) (5) $\begin{aligned} Io U_{Area} & = (X_{B} - X_{A}) \times (Y_{B} - Y_{A}) \end{aligned}$ (5) (6) $\begin{aligned} IoU & = \frac{Io U_{Area}}{(w_{A} \times h_{A}) + (w_{B} \times h_{B}) - (Io U_{Area})} \end{aligned}$ (6) where (X_A, Y_A), (X_B, Y_B) are centroids of detected (C_A) and predicted (C_B) bounding boxes as shown in Figure .

Figure 3. Calculation of IoUArea and IoU between detected and predicted bounding boxes.

Figure 4. (a) Assigning the identities to players and referees based on the threshold value of IoU in the consecutive frames. (b) Assigning the identities to players and referees by calculating the Euclidian Distance in consecutive frames.

3.3. Euclidean distance based tracking with unique identity

Within the outlined research methodology, we systematically evaluate the spatial distance between each identified player and the referee in consecutive frames by measuring the distance from the centroids of their respective bounding boxes. This intricate process incorporates the assignment of a distinct identity to each player and referee, a determination made based on the shortest Euclidean distance, visually exemplified in Figure (b). The allocation of identity to detected individuals hinges on a specified threshold value for the Euclidean Distance, introducing a crucial criterion for accuracy. Subsequently, the computation of Euclidean Distance is executed by employing precise equations tailored to proposed methodology. This comprehensive approach is designed to establish unequivocal and unique identities for players and the referee, thereby significantly enhancing the precision and reliability of our tracking system. (7) $\begin{aligned} C (1)_{bboxABF (1)}^{1} & = ((x_{AB (1)}^{1} + \frac{w_{AB (1)}^{1}}{2}), (y_{AB (1)}^{1} + \frac{h_{AB (1)}^{1}}{2})) = (c_{x (1)}^{1}, c_{y (1)}^{1}) \end{aligned}$ (7) (8) $\begin{aligned} C (2)_{bboxABF (2)}^{1} & = ((x_{AB (2)}^{1} + \frac{w_{AB (2)}^{1}}{2}), (y_{AB (2)}^{1} + \frac{h_{AB (2)}^{1}}{2})) = (c_{x (2)}^{1}, c_{y (2)}^{1}) \end{aligned}$ (8) (9) $\begin{aligned} C (3)_{bboxABF (3)}^{1} & = ((x_{AB (2)}^{1} + \frac{w_{AB (2)}^{1}}{2}), (y_{AB (2)}^{1} + \frac{h_{AB (2)}^{1}}{2})) = (c_{x (3)}^{1}, c_{y (3)}^{1}) \end{aligned}$ (9) (10) $\begin{aligned} EU D_{F 12} & = \sqrt{{(c_{x (2)}^{1} - c_{x (1)}^{1})}^{2} + {(c_{y (2)}^{1} - c_{y (1)}^{1})}^{2}} \leq Ma x_{distance} \end{aligned}$ (10) (11) $\begin{aligned} EU D_{F 23} & = \sqrt{{(c_{x (3)}^{1} - c_{x (2)}^{1})}^{2} + {(c_{y (3)}^{1} - c_{y (2)}^{1})}^{2}} \leq Ma x_{distance} \end{aligned}$ (11) where $C (1)_{bboxABF (1)}^{1}$ , $C (2)_{bboxABF (2)}^{1}$ , $C (3)_{bboxABF (3)}^{1}$ are the centroids of each detected bounding box in three consecutive frames calculated as shown in Figure (a).

In the context of proposed research methodology, the symbol $EU D_{F 12}$ represents the Euclidean Distance computation carried out between each bounding box identified in frame-1 and frame-2. Similarly, $EU D_{F 23}$ denotes the Euclidean Distance calculated between the bounding boxes detected in frame-2 and frame-3, as illustrated in Figure (b). This mathematical representation is integral to proposed methodology and aids in quantifying the spatial relationships between consecutive frames, contributing to a deeper understanding of the object-tracking process.

Therefore, to find out whether same player or referee is detected or not, minimum and maximum threshold values for IoU, EUD were defined and then identity was assigned in consecutive frames as shown in algorithm-2 (Figure ).

Figure 5. Proposed tracking algorithm to assign the identity to players and referee.

3.4. Performance metrics

The following metrics were used to assess object detection and classification. The proposed tracking algorithm recognises and tracks three classes (Player_W, Player_B, Referee_R). Therefore, the metrics are described as follows.

Precision is defined as: (12) $\begin{aligned} P & = \frac{T_{p}}{T_{p} + F_{p}} \end{aligned}$ (12) Recall is defined as: (13) $\begin{aligned} R & = \frac{T_{p}}{T_{p} + F_{n}} \end{aligned}$ (13) Therefore, from precision and Recall, F1-score is defined as: (14) $\begin{aligned} F1-score = 2 . \frac{P . R}{P + R} \end{aligned}$ (14)

4. Results and experimental discussion

The experimental configuration is as follows: Windows 10 Pro operating system, Intel Xeon 64-bit CPU at 3.60 GHz, 64GB RAM, NVIDIA Quadro P4000 GPU, 8GB with 1792 Cuda-cores, CUDA10.0, CUDNN7.5 GPU acceleration library.

As BackgroundSubtractionMOG2 was utilised to detect objects (players and referee) in soccer field, it doesn’t need to train the model. The area of contour was calculated based on the object centroid and the coordinates of the objects bounding boxes, to remove small elements. The threshold value of the area to detect players and the referee was selected as greater than 1000 and less than 15,000. Jersey colour was used to classify team-based players as well as the referee. The performance of the suggested technique is discussed in this section, for multiple object classification and tracking and evaluated using the ISSIA dataset. Using ISSIA dataset, the technique proposed in the paper was compared to the benchmark which has been extensively used to track players in soccer videos. To show the accuracy and efficacy of the proposed methodology, it was compared to conventional techniques often used for tracking multiple objects, as shown in subsections below.

4.1. Qualitative analysis

This section presents the tracking results from a video sequence obtained using the ISSIA dataset, which was introduced by D'Orazio et al. (Citation2009). To track efficiently, players and referees on the field required to be precisely spotted and categorised as Player_W, Plyer_B, and Referee_R. It is especially important that the location of each player within the squad is precisely detected and classified, and tracked with same identity. The identity of the players was changed from frame to frame when IoU is lost by tracking based on IoU as shown in yellow box in Figure as a view of camera-2. Moreover, the identity of almost all the players and referees (Id-364, Id-365, and Id-356) was changed from frame to frame by tracking based on EUD as shown in yellow and sky blue box in Figure as a view of camera-2. The identity of all the player and referee (Id-8, Id-19, and Id-23) remains same from frame to frame by tracking based on IoU and EUD as shown in yellow and sky blue box in Figure as a view of camera-2.

Figure 6. Assigning the identity to the players and referee based on IoU while tracking on camera-2 from ISSIA dataset using proposed methodology (Frame-253 to Frame-278).

Figure 7. Assigning the identity to the players and referee based on EUD while tracking on camera-1 from ISSIA dataset using proposed methodology (Frame-39 to Frame-138).

Figure 8. Assigning the identity to the players and referee based on IoU and EUD while tracking on camera-2 from ISSIA dataset using proposed methodology (Frame-43 to Frame-136).

The proposed approach tracked team-wise players and the referee, and the tracking results from three distinct camera views from ISSIA dataset videos are presented in Figure .

Figure 9. Tracking performance of three different cameras views around the soccer field taken from ISSIA Dataset.

4.2. Quantitative analysis

Tracking results were evaluated using several performance measures in this subsection. To statistically examine the efficacy of the tracking algorithm, 500 frames were manually labelled as Player_W, Player_B, and Referee_R from the actual video of ISSIA dataset. The proposed tracking approach classifies and tracks three classes (Player_W, Player_B, and Referee_R), whereas the other methods identified and tracked either player or the ball. To commence, the comprehensiveness of the tracking method was assessed using five metrics which are precision, recall, f1-score, accuracy, and confusion matrix. Therefore the performance of the proposed methodology on ground truth labels is shown in Table , and the following confusion matrix is shown in Figure . Identity associations with respect to IoU and Euclidian Distance are shown in Table .

Figure 10. Confusion Matrix ground truths and predicted labels on ISSIA dataset.

Table 1. Performance of the proposed methodology on ground truth labels.

Download CSV Display Table

Table 2. Identity assigning based on the IoU and EUD in 12 consecutive frames.

Download CSV Display Table

4.3. Limitations of proposed approach

There are a few limitations to our approach. The algorithm demonstrates exceptional robustness when dealing with static backgrounds. The algorithm encounters difficulties when there is movement in the background, as player detection relies on background subtraction methodology. Identity switching becomes increasingly noticeable in scenarios with severe occlusions, given that identity assignment is contingent on the IoU and Euclidean distance methods.

4.4. Comparative analysis

It denotes the tracker's ability to forecast precise object coordinates, regardless of proficiency in recognising object configurations, maintaining consistent trajectories, and so on. In order to evaluate tracking performance quantitatively, the proposed technique was evaluated to six representative tracking algorithms which are proposed on the classes Player (P), Ball (B), and Referee (R) i.e. DLBT (Kamble et al. Citation2019), Tracklet-based Multi-Commodity Network Flow (T-MCNF) (Song et al. Citation2021), Small-Soccer Player (Hurault et al. Citation2020), Topography (Kim Citation2019), Deep-Player Track (Naik et al. Citation2022b) and Adaptive Multiscale approach (Kim et al. Citation2018) on ISSIA dataset depending on Precision, Recall, F1-score, and IoU/EUD metrics as shown in Table .

Table 3. The proposed tracking algorithm was compared with state-of-the-art approaches on the ISSIA dataset.

Download CSV Display Table

In contrast, the methodology described in this paper offers a solution to the previously mentioned problem by skilfully handling situations in which the jersey colours of players and the referee undergo changes. This encompasses instances where users are required to manually update the colour thresholds within the algorithm to accommodate the new jersey colours for each inning of soccer matches.

4.5. Industrial significance of the proposed approach

The “EIoU-Distance Loss” holds significant industrial relevance in the context of automated team-wise player detection and tracking with jersey colour recognition in soccer. This approach addresses a critical need in various industries and applications:

Sports Analytics: In the field of sports analytics, such as soccer, automated player detection and tracking are invaluable. It allows teams and coaches to gain insights into player performance, team dynamics, and opponent strategies. The ability to recognise jersey colours is essential for tracking players effectively.

Broadcasting and Media: Automated player tracking enhances the viewing experience for sports fans by providing real-time information about player movements and statistics. It contributes to better sports broadcasting and analysis, attracting more viewers and advertisers.

Security and Surveillance: Automated tracking systems can be applied in security and surveillance scenarios, including monitoring crowd behaviour during large sporting events. Recognising players and referees by their jersey colours can aid in crowd management and security measures.

Training and Coaching: Soccer teams and academies can use automated player tracking for training and coaching purposes. It assists in assessing player performance, making tactical decisions, and improving overall team strategies.

Merchandising and Fan Engagement: In the commercial aspect of sports, recognising players by their jersey colours can be used for personalised merchandise, fan engagement, and marketing campaigns.

In summary, the EIoU-Distance Loss method for automated team-wise player detection and tracking with jersey colour recognition in soccer has significant industrial applications across sports analytics, broadcasting, security, training, commercialisation, and technology integration. It enhances the overall experience for sports enthusiasts, professionals, and businesses involved in the sports industry.

5. Conclusion

This study introduces a comprehensive methodology for multi-object detection, categorisation, and tracking, commencing with foreground extraction from the background, followed by precise object detection. A notable aspect is the effective categorisation of team players and officials using a colour mask, specifically identifying Player_W, Player_B, and the Referee. Subsequently, the classification facilitates the computation of Intersection over Union (IoU) and Euclidean Distance (EUD) between detected and predicted bounding boxes across consecutive frames. The assignment of unique IDs to each player and referee plays a pivotal role in significantly enhancing tracking performance.

The study's findings are systematically presented in three main sections: (1) Visual representations of qualitative analysis results, considering the presence or absence of an embedded platform; (2) Quantitative analysis results conveyed through diverse performance metrics; (3) Comparative analysis, systematically benchmarking the proposed methodology against other state-of-the-art methods using the ISSIA dataset.

Acknowledging two inherent limitations, the study notes that (i) when a player is undetected in one frame but identified in a subsequent frame, a distinct ID is assigned and (ii) instances of player occlusion or crossover result in the assignment of different IDs to the involved players.

In the pursuit of enhancing tracking capabilities for future applications, specifically with the goal of averting instances of identity confusion among players within the same squad, our research advocates for the integration of a Bi-lateral transformation technique and the implementation of jersey number identification. The comprehensive nature of the proposed methodology underscores its adaptability, indicating potential utility across a spectrum of real-world scenarios. This extends beyond sports like basketball and rugby, emphasising its capacity to address tracking challenges in various contexts. The recommendation emerges from a recognition of the multifaceted demands of tracking technology and the need for a versatile solution that can effectively navigate diverse scenarios.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The authors reported there is no funding associated with the work featured in this article.

References

Arbués-Sangüesa, A., Haro, G., & Ballester, C. (2019). Multi-Person tracking by multi-scale detection in Basketball scenarios. arXiv preprint arXiv:1907.04637.
Google Scholar
Burić, M., Pobar, M., & Ivašić-Kos, M. (2018, May 21-25). Object detection in sports videos. 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1034–1039). IEEE.
Google Scholar
Ding, B., Zhang, R., Xu, L., Liu, G., Yang, S., Liu, Y., & Zhang, Q. (2023). U 2 D 2 Net: Unsupervised unified image dehazing and denoising network for single hazy image enhancement. IEEE Transactions on Multimedia, 13.
Google Scholar
D'Orazio, T., Leo, M., Mosca, N., Spagnolo, P., & Mazzeo, P. L. (2009, 2-4 September). A semi-automatic system for ground truth generation of soccer video sequences. 2009 sixth IEEE International Conference on Advanced Video and Signal Based Surveillance (pp. 559–564). IEEE.
Google Scholar
Guan, D., Ma, T., Yuan, W., Lee, Y. K., & Jehad Sarkar, A. M. (2011). Review of sensor-based activity recognition systems. IETE Technical Review, 28(5), 418–433. https://doi.org/10.4103/0256-4602.85975
Web of Science ®Google Scholar
Hu, P., & Ramanan, D. (2017, 21-26 July). Finding tiny faces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 951–959).
Google Scholar
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., … Murphy, K. (2017, 21-26 July). Speed/accuracy trade-offs for modern convolutional object detectors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7310–7311).
Google Scholar
Hurault, S., Ballester, C., & Haro, G. (2020, October). Self-supervised small soccer player detection and tracking. Proceedings of the 3rd International Workshop on Multimedia Content Analysis in Sports (pp. 9–18).
Google Scholar
Johnson, N. (2020, 6-7 March). Extracting player tracking data from video using non-stationary cameras and a combination of computer vision techniques. In Proceedings of the 14th MIT Sloan Sports Analytics Conference, Boston, MA, USA, (Vol. 218).
Google Scholar
Kamble, P. R., Keskar, A. G., & Bhurchandi, K. M. (2019). A deep learning ball tracking system in soccer videos. Opto-Electronics Review, 27(1), 58–69. https://doi.org/10.1016/j.opelre.2019.02.003
Web of Science ®Google Scholar
Kim, W. (2019). Multiple object tracking in soccer videos using topographic surface analysis. Journal of Visual Communication and Image Representation, 65, 102683. https://doi.org/10.1016/j.jvcir.2019.102683
Web of Science ®Google Scholar
Kim, W., Moon, S. W., Lee, J., Nam, D. W., & Jung, C. (2018). Multiple player tracking in soccer videos: An adaptive multiscale sampling approach. Multimedia Systems, 24(6), 611–623. https://doi.org/10.1007/s00530-018-0586-9
Web of Science ®Google Scholar
Lee, W. T., Chen, H. I., Chen, M. S., Shen, I. C., & Chen, B. Y. (2017). High-resolution 360 video foveated stitching for real-time VR. Computer Graphics Forum, 36(7), 115–123. https://doi.org/10.1111/cgf.13277
Web of Science ®Google Scholar
Liang, Q., Wu, W., Yang, Y., Zhang, R., Peng, Y., & Xu, M. (2020). Multi-player tracking for multi-view sports videos with improved k-shortest path algorithm. Applied Sciences, 10(3), 864. https://doi.org/10.3390/app10030864
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, (pp. 21–37). Springer International Publishing. Proceedings, Part I 14
Google Scholar
Miyata, S., Saito, H., Takahashi, K., Mikami, D., Isogawa, M., & Kimata, H. (2017, 21-26 July). Ball 3D trajectory reconstruction without preliminary temporal and geometrical camera calibration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 108–113).
Google Scholar
Murthy, C. B., Hashmi, M. F., Bokde, N. D., & Geem, Z. W. (2020). Investigations of object detection in images/videos using various deep learning techniques and embedded platforms—A comprehensive review. Applied Sciences, 10(9), 3280. https://doi.org/10.3390/app10093280
Google Scholar
Naik, B. T., & Hashmi, M. F. (2023). YOLOv3-SORT: Detection and tracking player/ball in soccer sport. Journal of Electronic Imaging, 32(1), 011003–011003.
Web of Science ®Google Scholar
Naik, B. T., Hashmi, M. F., & Bokde, N. D. (2022a). A comprehensive review of computer vision in sports: Open issues, future trends and research directions. Applied Sciences, 12(9), 4429. https://doi.org/10.3390/app12094429
Google Scholar
Naik, B. T., Hashmi, M. F., Geem, Z. W., & Bokde, N. D. (2022b). Deepplayer-track: Player and referee tracking with jersey color recognition in soccer. IEEE Access, 10, 32494–32509. https://doi.org/10.1109/ACCESS.2022.3161441
Web of Science ®Google Scholar
Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., & Fei-Fei, L. (2016, 27-30 June). Detecting events and key actors in multi-person videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3043–3053).
Google Scholar
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016, 27-30 June). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788).
Google Scholar
Redmon, J., & Farhadi, A. (2017, 21-26 July). YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7263–7271).
Google Scholar
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Google Scholar
Rematas, K., Kemelmacher-Shlizerman, I., Curless, B., & Seitz, S. (2018, 18-23 June). Soccer on your tabletop. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4738–4747).
Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
Google Scholar
Santhosh, P. K., & Kaarthick, B. (2019). An automated player detection and tracking in basketball game. Computers, Materials & Continua, 58(3).
Web of Science ®Google Scholar
Senocak, A., Oh, T. H., Kim, J., & So Kweon, I. (2018, 18-23 June). Part-based player identification using deep convolutional representation and multi-scale pooling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1732–1739).
Google Scholar
Singh, U., Determe, J. F., Horlin, F., & De Doncker, P. (2021). Crowd monitoring: State-of-the-art and future directions. IETE Technical Review, 38(6), 578–594. https://doi.org/10.1080/02564602.2020.1803152
Web of Science ®Google Scholar
Song, Z., Wan, Z., Yuan, W., Tang, Y., Yu, J., & Chen, Y. P. P. (2021, 21-24 August). Distractor-aware tracker with a domain-special optimized benchmark for soccer player tracking. Proceedings of the 2021 International Conference on Multimedia Retrieval (pp. 276–284).
Google Scholar
Theagarajan, R., Pala, F., Zhang, X., & Bhanu, B. (2018, 18-23 June). Soccer: Who has the ball? Generating visual analytics and player statistics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1749–1757).
Google Scholar
Tijtgat, N., Van Ranst, W., Goedeme, T., Volckaert, B., & De Turck, F. (2017, 22-29 October). Embedded real-time object detection for a UAV warning system. Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 2110–2118).
Google Scholar
Wang, J., Amos, B., Das, A., Pillai, P., Sadeh, N., & Satyanarayanan, M. (2018). Enabling live video analytics with a scalable and privacy-aware framework. ACM Transactions on Multimedia Computing, Communications, and Applications, 14(3s), 1–24. https://doi.org/10.1145/3209659
Google Scholar
Wei, X., Sha, L., Lucey, P., Carr, P., Sridharan, S., & Matthews, I. (2015). Predicting ball ownership in basketball from a monocular view using only player trajectories. Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 63–70).
Google Scholar
Yuan, D., Chang, X., Li, Z., & He, Z. (2022a). Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Transactions on Multimedia Computing, Communications, and Applications, 18(3), 1–18. (15). https://doi.org/10.1145/3486678
Web of Science ®Google Scholar
Yuan, D., Shu, X., Liu, Q., & He, Z. (2022b). Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Transactions on Circuits and Systems II: Express Briefs, 70(3), 1224–1228. (14).
Web of Science ®Google Scholar
Zhang, R., Wu, L., Yang, Y., Wu, W., Chen, Y., & Xu, M. (2020). Multi-camera multi-player tracking with deep player identification in sports video. Pattern Recognition, 102, 107260. https://doi.org/10.1016/j.patcog.2020.107260
Web of Science ®Google Scholar
Zhang, R., Yang, S., Zhang, Q., Xu, L., He, Y., & Zhang, F. (2022). Graph-based few-shot learning with transformed feature propagation and optimal class allocation. Neurocomputing, 470, 247–256. (12). https://doi.org/10.1016/j.neucom.2021.10.110
Web of Science ®Google Scholar
Zheng, B. (2022). Soccer player video target tracking based on deep learning. Mobile Information Systems, 2022, 1–6.
Web of Science ®Google Scholar
Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., & Kautz, J. (2019). Joint discriminative and generative learning for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2138–2147).
Google Scholar

EIoU-distance loss: an automated team-wise player detection and tracking with jersey colour recognition in soccer

Abstract

1. Introduction