Entropy to Understand Luck in the Premier League
One of the most popular statistics in Football/Soccer analytics is expected goals (Xg). There are different ways to describe Xg. For one, it is the number of goals you would expect a team or player to score given where the shot came from. It can be based purely on geometry (angle and distance from the goal), historical shot data, or a combination of these and other variables. It can tell you about how well a team should be performing if everything went exactly as it should be.
To me this implies something harder to measure — luck. Are teams luckier than other teams? Are those shots going into goal as much as the Xg suggests they should or is everything going wrong for that team? In other words, is there a way to measure how luck a team has been with their goal scoring?
One thought I had regarding this problem is to tackle it by changing the problem to one of certainty. In machine learning, a measure of uncertainty is Entropy. As Kelleher et al. (2015) entropy is related to the “uncertainty associated with guessing the result if you were to make a random selection from the set [of playing cards].” If all the cards are the same, you are pretty sure what you will get if you pick at random (entropy = 0). If they are all different playing cards, you are much less certain (entropy = positive number). Shannon’s Entropy Model is a simple way to calculate the entropy of something with a binary outcome. In this case it is the probability of a positive outcome multiplied by the base 2 logarithm of that same probability. This is summed across all cases and multiplied by negative 1.
Here, I will use a season as a sample. The probability will be calculated for each match based on the number of shots taken and the number of goals scored. Then all matches will be summed for the entropy score. I calculate this using the goals scored by a team divided by the shots taken, and a second entropy score as the goals against a team divided by the shots against. This is summed across the whole season.
Intresestingly, teams that tend to do well have less certainty (more randomness) in the entropy for, and maintain a low entropy against (less randomness in being scored on). Thus a ratio of entropy for to entropy against provides information on how well a team is performing. To my mind, at least, I think about this as luck. If a team is lucky at scoring (high entropy) and teams have low luck in scoring against them (low entropy) they are going to be performing well.
Ultimately there seems to be two tiers of teams: those that have a high ratio of for to against, and those that have a high ratio of against to for. If you are in the latter category you will be in the bottom of the table. If you are in the former, top of the table. Generally, those with the highest ratio of for to against will be in the top 4 or 5 teams.
Let’s look at the 20/21 season for matchweek 8 and see if it has any information about the current standings (this is being written after matchweek 12 but during matchweek 13).
Southhampton comes out on top for Entropy for, and second for ratio of for to against, but were fourth in the league table. One thing this points to is that Southhampton has been more random at scoring goals (goals going in for them) and more consistent in teams ability to score against them.
As of matchweek 12…
Southhampton and Aston Villa are still on top for entropy, but Tottenham and Liverpool occupy the top two places in the table. I think what this is suggesting is that Southhamptom and Aston Villa have been playing very good defense, and have been lucky with their goals. They probably will probably occupy high spots as the season continues….if they are able to continue this way.
On the otherhand it would not be surprsing if Tottenham falls off their topspot over the rest of the season. The reason for this is their entropy ratio is quite close to 1. Historically speaking, it seems teams that have a ratio closer over 1.5 will be in the top spot. Those below 1 will find themselves at the bottom of the table. A team does not want to be unlucky in goals against.
An intersting case is Burnley in 17/18 and 19/20. Their ratio of entropy for to against indicating a great performance when of course they are at the bottom of the league table (relegated in 17/18 and 10th in 19/20). This may be partly explained by their goalkeeper Nick Pope (save percentage in 17/18 was .79 and 19/20 was .71). For comparison in those two season’s Manchester United’s David de Gea had .80 and .73, and Manchester City’s Ederson had .67 and .73 respectively. So here the consistency of goalkeeping was not matched by luck in goal scoring. See here:
If you would like to see the full tables for entropy and the previous seasons, you can access them here: