Fantasy Analytics — Fixture Difficulty
One of the hardest parts (and there are many) to playing Fantasy sports (specifically Fantasy Premier League) is determining the difficulty of a match. FPL includes a Fixture Difficulty Rating on a scale of 1 to 5 (1 easiest for the team you are viewing), but it is not entirely clear how they calculate it. I do know it is based on the most recent matches played, but not how they determine it. It is also not always very accurate, in many people’s opinions. I also find that it tends o regress to the mean of 3, which feels like a coin toss.
Fixture difficulty can be used in different ways. For one, it might help you select a team captain. It also places the number of points a player earns in context. A player averaging 6 points per match over four games with an average difficulty of 2 is different than a player with the same points over an average difficulty of 4.
So, what ways might their be for determining team difficulty? I do actually think the eyeball test has a place here. If you keep up-to-date with the performance of a team, then you probably have a good intuitive sense of how difficult the fixture might be. However, I have been experimenting with two different approaches using uncertainty and probability to measure how difficult a match may be for a team.
Entropy
Entropy is a measure of uncertainty and randomness. I have experimented with this in the pass to understand goalkeepers and squad performance. Anecdotally, what entropy measures is how consistent a team is at scoring and being scored against. A team with a high entropy for their goals for, tends to be “lucky” or good at scoring, while a team on the lower end tends to be bad at scoring.
To calculate these, for each match, you calculate the number of goals and shots for and against the teams, and divide by the number of shots. This is an estimate of the probability of scoring during that match for away and home teams. Take the log of this probability. Then, multiply the probability by the log of the probability by the number of goals scored. Finally, sum these across each team for their goals for and against, and multiply by -1. This is the measure of entropy called information gained.
For example, in 2020–2021, I calculated Manchester City’s entropy goals for as 36.3 and goals against as 16.08. To use this as a measure of fixture difficulty, take the inverse of the ratio of goals for (team measuring) to goals against (their opponents). The inverse is not really necessary, but I find it easier to intuit larger numbers are more difficult and smaller numbers are easier.
Using Manchester City vs Sheffield United as an example, the difficulty for Manchester City is 0.16 and for Sheffield United is 2.1. Or Manchester City faced a difficulty of .62 whereas Manchester United faced a difficulty of .49.
Wait, easier for Manchester United? Manchester City won the league. This is one of the limitations with the data we have. Manchester United let in a total of 44 goals over 418 shots, and Manchester City let in 32 goals over 285 shots. The probability of scoring against Manchester City was slightly better than against Manchester United, even if the totals were much bigger. In the end, the first match was a draw, and the second Manchester United won by two goals.
Probabilities
There are obvious limitations with the Entropy approach. Another, is to use the probability that a team will score a goal as the estimate of how difficult this will be. It is demonstrated elsewhere that the number of goals scored follows a Poisson probability distribution. Using the average number of goals scored in different scenarios, you can estimate how likely a team is to score k number of goals. For example, Manchester City scores an average of 2.3 goals at home. The probability that Manchester City scoring 0 goals at home is 7%, and 2 goals is 25%. They are much more likely to score than to not.
Using this information, you can take the probability of scoring and multiply it by the opponent’s probability of being scored against across a range of scenarios like 0 to 5 goals. Each of these probabilities are then multiplied by a difficulty rating and summed to get the difficulty of a fixture for that team.
The averages of for and against are determined by the team and whether it is a home or away fixture. Depending on your model, this may end up being some redundant information if you include a variable for home or away.
For me, I set the difficult on a scale of 5 to 1. Looking at the results of fixture difficulty in 2020–2021, the average was 1.01, median .98, upper quantile 1.4, and lower quantile .7. These cutoffs might be useful to determine a categorical grouping (easy, moderate, average, difficult).
Some examples, Manchester City (H) vs Sheffield United (A) had a difficulty of .64 (City won 1 to 0). Sheffield United (A) vs Newcastle United (H) had a difficulty of .96 (Newcastle 1 to 0). Finally, Sheffield United (A) vs Liverpool (H) had a difficulty of 1.5 (Liverpool 2 to 1).
One way we might think about these numbers is that the first match listed above was about .64 times as difficult as the average, and the last match as 1.5 times as difficult as the average.
You could also use a measure like expected goals for or against as lambda in the Poisson distribution.