Exploring Event and Tracking Data using Metrica Sports Open Data Part III -> Average Positions

This will be a series of articles exploring the Metrica Sports Open data available here: https://github.com/metrica-sports/sample-data. This dataset includes two real association football matches using both event-based data and tracking data. The two teams are unknown (and I believe each match involves different teams), and the players are unknown. The goal is to explore analytic and visual techniques that could be applied to event-based data of any kind, and tracking data of any kind.

In this part, I’ll take a look at some ways to calculate the average positions of players during a match. Basically, comparing where a player is when looking at event data and tracking data. This was already done in Part I to visualize the passing networks, using the passing event coordinates.

The average position is literally that. It takes the average of the x-coordinates and the average of the y-coordinates, then use these as the new coordinates. This is known as the mean center in other circles. Like any distribution, it is a measure of centrality, and can be skewed by outliers. So if a few passes happen on the defensive side of striker’s team, then those events will maybe pull the average position toward that striker’s own goal.

And as with other measures of a distribution, there is a geographic equivalent to standard deviation. There are two forms of this: standard distance and a standard deviational ellipse. The standard distance is the deviations of the coordinates from the average position, and gives a sense of how spread out a player might be. Standard deviational ellipse would also give a sense of direction. I’ll use scale the position location by the standard distance to look at who tends not to deviate much from their standard position.

You can view the (semi) interactive visualization here:

You can access the notebook for loading and processing the data in Tableau here.

For each visualization the average position is split into the different periods. I think this adds a lot of useful information to discover where a player typically plays (orange=2nd period). Using event-based location as the basis for your average placement of the player will reveal the center of their on-the-ball activity space. You can take either all events, or a subset, such as a pass events. There is very little difference comparing all events with pass events, since pass events are the majority of all events.

Interestingly, it seems that the goalkeeper is one of the players that had more variation to the distribution of their events, as suggested by the standard distance (size of the circle). The defenders also moved around more than some of the more forward playing positions. The smaller circles may indicate the players that were substituted on at one point.

The home team closer to their own goal in the second period, defensively (Player 2 and Player 3). This was probably in response to the Away team pressing and playing deeper with their midfield and forwards. Interestingly, if you go to the interactive version, you can see player 23 moved from the outside to the inside of play during the second period. However, this is actually misleading when compared to the tracking data.

Tracking-based data provides near-continuous information on where a player is on and off the ball. As expected, the average position will be different because you are also capturing where they are when they are resting. In a sense, this is a rough comparison of on the ball activity (event-based positional averages) and off the ball activity (tracking-based). I say rough, because the tracking still includes the on-the-ball activity.

The goalkeepers for both teams tend to play further forward when off the ball, and then further back when on the ball. This is probably because their only event data is for a few passes, and then when a shot occurs.

While all players seemed to play a little more in the middle, Player4 on the home team is probably the best example of the difference here. Their average position for on-the-ball activity was much higher on the goalkeeper’s left-hand side (top of the figure) further up the field. For tracking data, they played further back and more towards the center. You can use the interactive version to explore this a little more. To me, at least, that suggests that the players converge towards the center of the field when out of possession and then spread out more when in possession. Or at least when they don’t have the ball.

Comparing Player22 in the passing event data vs the tracking data shows the differences between using these two datasets. Player22 in the second period in particular moved to the upper left quadrant (from the goalkeeper’s perspective) during their pass events. But according to the tracking data, that same area was very empty, and Player22 was much further back. To me, this suggests they would move into that space to create passing opportunities.