COSC3000 - Visualization, Computer Graphics & Data Analysis
Data Visualisation Project Report
Manchester United: The Ferguson Years
An analysis of Manchester United performance in the English Premier League between 1995 and 2013 with selected visualisations.
1 Introduction
“Some people believe football is a matter of life and death, I am very disappointed with that attitude. I can assure you it is much, much more important than that. . . ” - Bill Shankly (Former footballer and manager)
For as long as competitive sport has existed, the desire to be (or at least support) the very best has also. Without comment on whether a positive aspect, modern times have seen sport become a professional endeavour, with vast sums of money involved in nearly every aspect. This has led to more and more sophisticated techniques used to analyse outcomes, improve performance, predict results.
While sports statistics and analytics have always been used in this regard, the use of sabermetrics in baseball and the subsequent book/film Moneyball have created an upsurge in their popularity and usage. One sport which is notoriously di伍cult to analyse due to low scores and tight margins is association football.
Manchester United Football Club is one of the most recognisable names in world sport. This is in large part to their great success on the field, especially in modern times under the tenure of now retired manager Sir Alex Ferguson. Competing mainly in the English Premier League, they have been crowned champions a record 13 times since the inception of the competition in 1992.
1.1 Aims
This report will examine the English Premier League data between 1995 and 2013 in an attempt to determine both the level of Manchester United’s success and the potential reasons for it. This will be achieved mainly through the use of data vi- sualisation to highlight patterns or interesting trends in the data not readily visible otherwise.
1.2 Background
For those unfamiliar with the English Premier League, it is the most popular domestic club football competition in the world. Forming from the previous Division 1 in the English football league system in 1992, it begins in August and runs through until May the following year. Since the 1995/96 season it has consisted of 20 teams in total, with each playing all others twice - once at their home ground and once at their opponents home ground. A team is awarded 3 points for a win, 1 point for a draw, and no points for a loss, with the overall winner being the team with the most points at the end of the season. The three teams at the least number of points are relegated to the league below.
2 Methods
2.1 Data Collection
Sourcing a single reliable data repository for all desired data proved troublesome. Such services do exist, however due to the “big business” nature of professional sport, access to these were prohibitively expensive. As such, community and fan created resources had to be relied upon. The quality and reliability of any analyses performed on these data is thus only as good as the data themselves. With that being said, some measure of insight can still be found.
The data used in this report came mainly from four separate sources:
• Wikipedia list of Manchester United seasons
• Bookmaker data aggregation site Football-Data.co.uk
• Transfer spending data aggregation site transfermarkt.com
• UK football stadium site doogal.co.uk
The type of collection method varied between each source, dependant on how the data were presented. For the Wikipedia articles, the Python programming language and HTML parsing library BeautifulSoup were used. Firstly, each of the required links of the main page were looped through and the HTML source saved locally. A detailed description of how the data were processed and made into a workable state is provided in the subsequent section. Some of the details provided in this dataset included the date, opponents, location, full time results and scores, time of goals, and attendance for each Manchester United game of each of the seasons.
Figure 1: Example Wikipedia Data
The bookmaker aggregator data were presented as a series of CSV files. The Chrome plugin Download Master was used to automatically download every Premier League data file for the required seasons. The subsequent section details how this data were processed into a usable format. This dataset included much of the same data as the Wikipedia entries, however for every game in every season. It also presented the referees and gambling odds data for games in seasons 2000/2001 onwards.
For the transfer spending data, complex HTML tables made parsing the data some- what difficult. As such the data were copied and pasted manually for each season and team into CSV files for later processing. Provided was the total spending of each Premier League club (and the league as a whole) for each of the Summer transfer windows. The Summer transfer window is one of the two sanctioned periods where player purchase and transfer is allowable. Unfortunately the second (January) trans- fer window data were unavailable. That being said, however, the Summer window is the longer period and is often when clubs do most of their purchasing.
The football ground location data were presented nicely in table format . As such they were simply pasted into a CSV file for later processing. Provided were the stadium name, the team that plays there, the capacity, and coordinates of the ground.
Figure 2: Example Location Data
2.2 Data Processing and Cleansing
The Python library BeautifulSoup was used to parse the HTML files downloaded for all Wikipedia entries. A loop through each of the files stored pulled required information into dictionary objects indexed by season and game number. Once this was done, a search was completed through each of the entries for anomalous or in- consistent entries.
Due to the nature of the data source, many were found. Things such as inconsistent naming for opposition teams (e.g. both Blackburn and Blackburn Rovers appear- ing) and weird character combinations in goal scorer names due to incorrect ASCII encoding were quite common. A combination of modifying the original HTML files and tweaking the parsing code eventually led to a state where the data were in a consistent, readable format. Helper functions for data access were then created for easy retrieval during analysis and visualisation steps.
For the remaining data, the built-in csv module within Python was utilised to pull all required data into dictionary objects. Similarly to above, functions were created to ensure ease of access during later stages. Again there was much di伍culty with the consistency and quality of the data, which appeared to be a recurring issue throughout the data collection and processing stage. For example, the same referee was listed as “C Foy”, “CJ Foy”, “Chris Foy”, and “Foy, C” all within one CSV file. Often it was simpler to manually correct these issues than writing some small script to do so. This was judged on a case by case basis depending on the scale of the inconsistencies found.
3 Results and Discussion
3.1 Success vs Rivals and Strong Opponents
It is widely believed among sports fans that results in “big games” show the mettle of true champion teams. As such, it was desired to see how Manchester United performed against their traditional rivals and other strong teams within the league. The wins, losses, and draws against other traditionally successful teams were tallied for both home and away games and subsequently plotted as a percentage of the total number of games. The results can be seen in Figure 3 below:
Figure 3: Wins/Losses/Draws Against Strong Clubs
It can immediately be seen that for many of the bars, the green section stretches over the 50% mark, whereas no red section does so from the opposing side. One other thing which is unsurprising is that it appears teams tend to perform better at their home grounds - red bars are larger for the away results.
From this it also be seen that only two teams appear to have a better win percentage than Manchester United - Chelsea and Arsenal when playing at their respective home grounds. This is unsurprising, as these are the clubs which have won the most Premier League titles other than Manchester United.
3.2 Monthly Performance Comparison
It is a commonly held belief of Manchester United fans that they are poor starters. Inconsistent results at the beginning of the season can make it more difficult to chase down opponents during the latter part of the season. Thus, the average number of points won per game (maximum 3, minimum 0) in a given month was calculated for each season and visualised on a polar plot to determine if this belief held any credence. The colour of the circle also represents the league position at the end of the month, to see how the points won afected their overall standing. The results can be seen in Figure 4 below:
Figure 4: Monthly Average Points Won
Upon brief inspection, it appears there may be some truth to the belief. It appears there are are more “small” circles in the initial few months of the season in compar- ison to some of the later months. The appearance of more red circles in the early month is also a sign of their slow start, however it is also a result of the fact that any points won by a team may drastically change the composition of the ladder due to small differences between teams since not many games have yet been played.
It can be seen that towards the later months of the season, more consistency is achieved. Another thing of note is that when heading into the last month of the season in first place (a green circle in April), they have always ended up winning the league (a green circle in May). This is an indication that they were either too far ahead to catch, or they did not “choke” when it came to the important final few games.
An unfortunate consequence of this graph is no circle appearing during a month where no points were won, such as May of the 2000/01 season. This was an unexpected result due to the success of the club. That being said, however, they still won the league in that season 10 points clear of their nearest rival. These results then may have been an indication of the club playing junior or reserve players to give them experience since the outcome of the league had already been decided in their favour.
3.3 Geographical Performance Comparison
It was desired to see whether there were any particular geographical locations within England where Manchester United performed better or worse. As such a the location of each opponents ground was plotted with the size of the circle representing the number of games played against the given opponent and the colour the percentage of total points available won. This can be seen in Figure 5 below:
Figure 5: Geographic Performance
It appears that they seem to perform well against clubs in the middle and far North of the country, perform strongly in the greater London region apart from at two grounds (the aforementioned Chelsea and Arsenal), and average in the region sur- rounding their own area. Based on this, however, it appears that this is mainly an indication of the relative strength of the club that plays at the ground as opposed to any other factor.
The English league system has clubs which play from all over the country. Those that have been in the Premier League for any time in the seasons under examination in this report will have a circle on the above map. Those with larger circles have been in for longer, since that is the number of games against Manchester United. This tells us how many games in they have had in the Premier League by proxy (since Manchester United have been in the Premier League since its inception).
It is interesting to note that those teams which have spent time in the Premier League (and those who have spent the most time) correlate quite well to the areas of high population density of England as shown in Figure 6 below:
Figure 6: Population Density of Great Britain
3.4 Transfer Window Spending
One of the common beliefs is that clubs tend to “buy” trophies. This implies that they spend large sums of money on the best staf, facilities, and players that smaller clubs cannot aford, thus giving them much greater chances of success. A visualisa- tion of how much a selected number of the bigger clubs spend as a fraction of the total league spending was created to give a perspective by how much they outspend their smaller counterparts. This is shown in Figure 7 below:
Figure 7: Summer Transfer Window Spending for Selected Clubs
Yellow lines represent years when Manchester United won the Premier League
One thing which is evident straight away is that spending has an upward trend over time for the league as a whole. This is hardly surprising, with news of record transfer fees being paid for players constantly making news in recent years.
Something else interesting is the large spending increases starting 2002 for Chelsea and 2007 for Manchester City. This is the approximate time that each respective club was purchased by a billionaire investor. It can be seen that this led to periods where Manchester United failed to win the Premier League, indicating tighter competition among the bigger clubs.
3.5 Win Percentage in Adverse Conditions
A great number of sporting clichs can be attributed to how a team performs when under pressure. How they are “when the chips are down”, possessing a “fighting spirit” or being labelled “comeback kings” are just few. In this light, the win per- centage after being behind at half time was calculated for each club and a ranked bar plot created. This can be seen in Figure 8 below:
Figure 8: Win Percentage After Trailing at Half Time
It can be seen that Manchester United is only second to Arsenal in this regard, with no other club coming close to the 20% mark set by these two. Note, however, this doesn't include the percentage of draws after trailing at half time, or the percentage of wins after being a draw at half time. These results are largely similar with a reordering of some of the lower clubs.
In the same vein, the win percentage when odds were in favour of the opposition were calculated and a ranked bar plot created as in Figure 9 below:
Figure 9: Win Percentage When Odds Favour Opposition
Note that this graph does not take into consideration the scale of the diference in the odds. It can be assumed that when the odds were against Manchester United the diference would be small as opposed to some of the other clubs. With that being said, however, bookmakers stake their livelihood on correctly predicting outcomes and nearly 30% of the time they predict an opponent will beat Manchester United they get it wrong, more than any other club.
Looking at things a diferent way, since there are two sides in every football match, it was important to consider how well a team prevented a team from coming from behind to snatch victory. As such, the percentage of wins when leading at half time was calculated, ranked, and plotted. This is on view in Figure 10 below:
Figure 10: Win Percentage After Leading at Half Time
It can be seen that Manchester United close out games with nearly 90% e伍ciency, a full 5% more than any of their closest rivals. Again, when taking draws into consideration the ranking remains very similar with Manchester United still leading the pack.
4 Conclusion
The data and visualisations presented paint a clear picture of a dominant club in English Premier League. This, however, was already apparent and did not need much exposition. What was found that was interesting, however, was the level of domination and some of the reasons why. Manchester United appear to perform well against all opposition, whether playing at home or anywhere else in the country; they perform consistently well towards the end of seasons when the finish line is in sight; and they win in situations they shouldn’t more than any other club.
Further exposition of the available data and potentially finding alternative data sources which provide micro level statistics about individual games (possession, shots on target, successful pass percentage etc.) may provide more insight into the the how and why of their success, however for now that is outside the scope of this report.
References
[1] “List of Manchester United F.C. seasons.” http://en.wikipedia.org/wiki/ List_of_Manchester_United_F.C._seasons, 2014. Accessed: 2015-03-15.
[2] “England Football Results Betting Odds.” http://www.football-data.co.uk/ englandm.php, 2015. Accessed: 2015-03-15.
[3] “Summer transfers - Premier League.” http://www.transfermarkt.com/ premier-league/sommertransfers/wettbewerb/GB1/, 2014. Accessed: 2015- 03-20.
[4] C. Bell, “UK football stadiums.” http://www.doogal.co.uk/ FootballStadiums.php, 2015. Accessed: 2015-03-20.
[5] P. C. Ross Biddiscombe and J. Hayden, The O cial Encyclopedia of Manchester United. Simon & Schuster, 2011.