Visual analysis for Fantasy Sports
While, I’m relatively new to the world of football fantasy (English Premier League and Bundesliga), and have made plenty of missteps, it is a great way to work with sports data metrics. What I mean by this, is you can use the data at hand to evaluate players that you want to be on your fantasy team. Not an exact corollary to scouting, but you still have to find the best players that fit within a budget. To do this, you need to find players that are performing well. Ideally, these will be affordable players that are outperforming other players and owned by fewer managers. However, one of the lessons I’ve learned since the beginning of the 2020–2021 season is that you cannot rely on this solely because when premium players (think Salah and Lewandowski) do well and you do not own them, the cohort of owners moves up. It’s a balance of course.
The different leagues’ fantasy systems also affect how you might approach selecting players. For example in Fantasy EPL, you need to focus on players that can score goals, and defenders on teams that get clean sheets. Those are really the two main ways to earn points — Goals and Clean Sheets (with assists coming in third). With Fantasy Bundesliga, you earn points for those as well, but also points for the team winning and different performance metrics like pass that leads to a shot on goal, or winning a duel. This really opens up the possibility for finding a range of players that will perform well weekly. While I enjoy both leagues, I find the Bundesliga’s approach to be more fun from an analytics perspective.
My initial approach to finding players in Fantasy Bundesliga (FB) was to use the different player performance variables found on FBREF. I used the data from the 2019–2020 season, and a factor analysis to weight the variables before combining. Using the weights, I would rank the players in the 2020–2021 season as the matches progressed. My initial inspiration was to reduce the variables into the EPL’s Influence, Creativity, and Threat categories (combined ICT index). The writeup, with all the flaws, is still available here https://github.com/davidlamb/SpatialSoccerAnalysis/blob/master/fantasy/Bundesliga_ICT.md and a visualization is available here https://public.tableau.com/views/BundesligaICT/BundesligaICT?:language=en&:display_count=y&:origin=viz_share_link
This worked alright, I believe, for the first few weeks as more data became available. However, once the season started rolling on, it smoothed out the fluctuations of performance too much. For example, a player may still be rated high in the ICT because it took into account the whole season up to that point, and was not affected by poor performance or injury in the most recent weeks. However, I’ve started to adapt my approach to finding players, and will share some of my techniques.
1. Use data reduction techniques
As I mentioned above, my original approach was to use a factor analysis. In that case I tried reducing the data to three main factors then turned those into weights to combine the variables into an index of sorts. Instead, you can use a Principle Component Analysis (lots of tutorials are available on this) to reduce the data into their principle components. For example, here I’ve reduced about 20 variables from FBREF to reflect the first two primary components. This allows me to visualize the data on two axes to highlight some differences amongst the defenders in the Bundesliga.
One of the more interesting locations is the bottom right corner where Angelino and Guerreiro reside. They are two extremes in both components (negative in the second, and high in the first). They stand out because they almost play more of a midfield position and have more scoring opportunities compared to other players listed as defenders. This is good for Fantasy Bundesliga because you will get more points for their attempts at goals (bad if they are on a team that concedes goals because they will receive negative points). So basically, players on the high end of the First Component will tend to have more chances at scoring, than those on the lower end (e.g. Javi Martinez with 0 goal attempts and a First Component value of .2. The second component seems to be an amalgamation of different defensive variables, and is a little harder to interpret. Still, this chart makes it easy to explore different defenders.
2. Compare most recent weeks to previous weeks
Another flaw I noticed with my original approach is: as the season continues the index tends to rely heavily on past performance rather than recent performance. This tends to miss players that may be out from injury, or left out of the lineup. So it is always useful to compare recent performance to past performance. Now, this assumes there is enough data points available. Below I will use the most recent 5 weeks of performance compared to the start of the season. But, you could also compare to the previous season’s performance.
This first chart shows the player’s points per matchweek in the past to the past five matches. I think this helps in two ways: finding consistent players, and avoiding kneejerk reactions to one match. But, you also need enough data to do this. Also, the choice in five weeks as a cut off point is arbitrary. Five matches seemed like it would be good to smooth out the fluctuations. Of course, splitting the data in two may also be sufficient.
In the top right-hand corner of the plot we have players that have done really well, and continue to do really well. The peak example is Lewandowski who scored 15 points per match up to gameweek 13, and 15 points in the most recent 5 gameweeks. Silva and Muller have also been very consistent this season. On the other hand, Sancho has been performing much better the last five weeks than up to gameweek 13. There may be many reasons for a drop off or an improvement in points earned: injury or playing time comes to mind. So this can only be used to flag certain players, and then requires more in-depth look.
3. Putting them together
Then you can combine these all into one chart. Here I’ve used PCA to create an average score based on values for the whole season, then applied to the most recent data and past data. This let’s me see performances that are relatively stable over time, increasing, or decreasing.
Not only is Lewandowski doing well in terms of points, but according to this analysis, he has improved over the last five gameweeks (GW 18 at the time of writing this). Kramaric on the other hand, started very well but then didn’t sustain this (it could be the schedule and the teams they were playing that helped). A couple of rising stars have been Jovic, and Demirovic. At least, according to this chart.
To conclude, you shouldn’t rely on one chart or one perspective, but use each to identify specific players that can be evaluated further (watching highlights, performance over time, etc…)