Our team set out to build an exceptional football team for less than 100M Euros. Using data provided in the FIFA 2020/2021 dataset - the video game - we built a prediction model in order to find the key performance attributes for each position. Then, we used this to pick out a team of excellent but undervalued players.
Much of the football data world that we see is descriptive. Commentators tell us about Roberto Firmino's poor scoring record away from home, Pundits marvel at the distance run by Bernardo Silva, web applications let us compare the number of key plays made by João Félix vs Luis Suárez and some go even as far as giving us the xG (expected goals) metric. This is useful and interesting information, but it is not data science. There is an intoxicating world of football stats that lies behind this, hidden from public view and reserved for club analysts and official channels with the resources and finance to pay for it.
Rather than telling us about performance, data science techniques can be used to affect performance.
This is the realm of statisticians and scouting teams, where models are applied to the minutiae of match events and clustering is deployed in attempts to find the next emerging superstar. Rather than telling us about performance, data science techniques are harnessed to affect performance. This can be in terms of adjusting tactics, finding weaknesses or crucially, and often at a hefty price, bringing in new players that closely match a club's requirements.
Plugging holes in football teams with new players is a risky business. Transfers involve a multitude of variables that are difficult to control and with price tags and agent fees soaring to dizzying heights, it's hardly any wonder that new signings feel enormous pressure to perform. One thing that clubs can control is ensuring that their transfer targets are genuine prospects. Scouting teams analyze the characteristics of thousands of players before matching this against the value of players to find opportunities and ultimately deciding to make a move on one.
This is the method put forward by our ambitious team of football fans and data nerds; to find a team of exceptional footballers on a budget. The reputation of a player is often worth millions in the world of big-name transfers and does not always represent the player's true ability. Standing in the shoes of Billy Beane - the inspiration behind Moneyball -, whose famous journey with the Oakland A's baseball team proved a point about sports players flying under the radar, we were confident that we could build a team of outstanding talent without leaving a hole in the bank account.
To start our analysis we needed a dataset. Companies like OPTA, StatsBomb and WyScout collect detailed granular datasets on everything from the coordinates of a James Ward-Prowse free-kick as it moves through the air to the effect that Adama Traore's summer workout regime has had on his thigh muscles. These are costly and, for a project of this scope, we needed something publicly available. Thankfully, the data science community have a workaround (they tend to be pretty useful with this kind of thing); The FIFA 2020/2021 dataset containing microdata on all 14620 active players listed on the most recent release.
We set out to build a team of exceptional players for under €100M . First, we needed to understand what characteristics make a player exceptional. To do this, we built a prediction model using 35 performance attributes to predict the value of a player.
We set our budget at €100M and set out to find 11 players. That works out to an average of roughly €9M per player. To put the challenge in context, Kylian Mbappe, the most expensive player in the dataset is valued at €5M higher than our team budget; €105M. The Liverpool starting XI is worth €600M and Sevilla's team, itself driven by data and Monchi, is priced twice as high as our cap.
Each row in the FIFA dataset represents one player listed alongside their characteristics for over 35 variables. Every player is given a rating out of 100 for skills like Short Passing, Composure and Speed and importantly a Value- a figure closely based on a player's actual worth. With the data in hand and cleaned up a little, our team built a prediction model in order to understand which characteristics were most influential when considering a player's value. We set Value as our target variable then listed every skill characteristic as a factor.
What Makes a Good Player - Great?
The characteristics of football players vary hugely between positions. The qualities that make Erling Haaland the most sought after forward in world football today are not the same qualities that have arguably made Bruno Fernandes the best transfer in Premier League history. Different positions come with different demands.
For this reason, we decided to focus our work on identifying the distinct key performance characteristics of goalkeepers, defenders, midfielders and forwards. The team recognised that there would be further opportunity to delve deeper into the specifics of each sub position; what distinguishes a right back from a right wing-back for instance - club analysts would have a field day here. But considering the scope of our project and the fact that, despite our delusions, we are not yet football managers, we stuck to these 4, admittedly general, positions.
Our first prediction model incorporated all 14620 players in the data. Next, we build four models using the same flow, one for goalkeepers, one for defenders, one for midfielders and one for forwards. This let us focus in on most significant attributes of players in specific positions.
Our first prediction model incorporated all 14620 players in the data.
Next, we built four more prediction models using the same flow; 35 performance attributes as factors and the value of a player as the model's target. This time, we filtered the players in the project so each model would only analyze one of the 4 general positions. With projects built specifically analyzing goalkeepers, defenders, midfielders and forwards, we were ready to hone in on the key performance attributes of our players. Ultimately we hoped to pick out five characteristics for each position that are most significantly related to the ability of a player. Or to put it simply - what defines the difference between a top striker and a poor striker?
Whilst executing the projects, Graphext transformed our quantitative value targets into handy quartiles. For each project, this left us with a new categorical rank - Value Quartiles - that classed each player as having either a High, Medium-High, Medium-Low or Low value. These quartiles were the secret to finding which skills were driving up the price of players. If we could recognise these skills then we could also recognise the players possessing them and snap up the cheapest of the bunch.
Using the categorical quartiles for the value of players, we compared the skills sets of High, Medium-High, Medium-Low and Low value players to determine which skills cause players to be assigned higher values.
For each position, we opened up the Value Quartiles in Graphext's compare panel and generated a series of charts explaining the difference between the quartile ranges. These charts, signifying the relevance of each performance attribute in relation to the value of a player, allowed us to identify the most important performance attributes of footballers in our four general positions.
The qualities distinguishing ranks of forwards concerned a player's ability to take chances in the final third of the pitch. Positioning ranked as most important with Finishing and Ball Control close after and followed by Reactions and Shot Power.
For midfielders, Ball Control and Reactions also featured. On top of these, Vision, Dribbling and Short Passing are considered to be the most significant performance characteristics of a midfielder.
The attributes best explaining the difference between ranks of Defenders were all about their ability to cut out chances in front of goal. Interceptions, Standing Tackle, Sliding Tackle and Defensive Awareness were considered to be the most significant variables here, whilst Reactions featured once again.
The key characteristics of Goalkeepers stand apart from outfield players, primarily because of their unique role in a match. In order of relevance, we identified that Reflexes, Diving and Positioning were listed alongside Handling and Reactions as the performance attributes best distinguishing between our ranks of Goalkeepers.
Having identified the characteristics most important when determining the overall quality of a player, our team set out to find our hidden gems. To do this, we needed a method.
- Each player must be positioned on the Graph as close as possible to the best players in their position. This suggests commonalities between our shortlist and the game's greatest footballers.
- Individual players must not cost more than 10M.
- Players must not be older than 28.
To interrogate the dataset using this method, we applied a combination of filters to the data inside the Graph reducing the data points shown so that only players that met our criteria were shown. First, we filtered by age to omit players over the age of 28. Then, we filtered out players that cost more than 10M. Repeating this process for players in each position left us with a Graph showing only potential prospects.
Using a combination of filters corresponding to our method, we reduced the players in our data so that our Graph presented only genuine prospects.
Next, we zoomed in on the value range 0 - 10M and applied color and size mapping to this variable to give us an immediate indication of the value of players as we scanned the Graph. Finally, using the position of each player on the Graph, we created shortlists for each position.
At this stage, it became essential to consider candidates from a variety of positions in a football team. We didn't want to end up with a defence comprising solely of left-wing backs. Filtering with the Best Position variable, we created our shortlist to cover a range of sub-positions in both defence and midfield.
The Team that Data Built
Now it was time to build our team. With our shortlist in hand, our team put on their manager's tracksuits and turned their attention to choosing the best selection of players from the candidates we had identified.
Creating the shortlist using the age and value filters highlighted the top players flying under the radar. Choosing players according to their position on the Graph gave us a high level of confidence in the ability of the players. But there was still some noise in the data.
To be confident that we were picking the right players for each position, we cross-checked our shortlist with each of the skills areas we had identified as significant for that specific position.
Before settling on a dream team, we cross-checked our shortlisted players with the key skills we identified for each position, ensuring that we choose the players with the highest ratings for these skills - whilst also considering the value of the players.
The 11 players we decided upon are valued at €89 M Euros and have an average age of 27.
P. Gazzanega - Age: 28 - Value: €8M
Left Back: R. Rodriguez - Age: 27 - Value: €9M
Centre Back: K. Vogt - Age: 28 - Value: €8M
Centre Back: S. Denswil - Age: 27 - Value: €4.5M
Right Back: D. Zappacosta - Age: 28 - Value: €8M
Left Wing: R. Skov - Age: 24 - Value: €8M
Centre Midfield: D. Pröpper - Age: 28 - Value: €7.5M
Centre Attacking Midfield: S. Verdi - Age: 27 - Value: €9.5M
Right Wing: R. Centurión - Age: 27 - Value: €9.5M
J. Ayew - Age: 28- Value: €9M
A. Hunou - Age: 26 - Value: €8M