World Cup 2018 Data Visualization

Visualizations done with ggplot2 and the Wes Anderson color palette on the 16 teams in the knockout phase

Joy Harjanto
4 min readJun 29, 2018

World Cup is a monumental global event that attracts devoted fans and soccer nerds from all over the world to come together, watch and celebrate the sport.

Drama, suspense and surprise marked the World Cup’s preliminary group matches. Germany, the 2014 World Cup champion, failed to advance after South Korea scored two goals in the last ten minutes of the game. Argentina, the 2014 World Cup first runner up, barely qualified to the top 16 round after it lost and tied its first two games.

The outcome of future matches are uncertain but will continue to capture a global audience through its riveting and intense performances, history, and rivalry.

Side note: The prospect of Cristiano Ronaldo and Lionel Messi competing at the quarter finals in what may potentially be their last world cup game is something to look out for!

Sports is frequently coupled with data analytics. There are many historical FIFA data sets available on kaggle and the web for soccer fans/statisticians to analyze.

I want to hone my R skills so I decided to use the world cup as a centerpiece of my effort through data analytics and visualizations. I utilized the results from the 16 countries in the knock-out stage to produce a straightforward visualization with the ggplot2 package for my first project. Hadley Wickham, a Statistician and chief scientist at R Studios, created the ggplot2 package to help R users navigate and create better visualizations.

install.packages(“ggplot2”) #install ggplot2
library(ggplot2) #access ggplot2

The ggplot2 allows you to download palettes from the internet. I used the Wes Anderson color palette to spunk my visualizations. I have included the code snippet below for those interested in accessing it:

install.packages(“wesanderson”) #install color palette
library(wesanderson) #access palette
names(wes_palettes) #gives names of palette
wes_palette(“Darjeeling1”) #Darjeeling1 color scheme
Darjeeling 1 color scheme

I employed the GrandBudapest2 palette and the data of the 16 countries’ performance in the preliminary knock out round for the first visualization.

Data Frame for Visualization 1
Group Knock Out Stage

Uruguay, Croatia and Belgium are the only three countries to have won all three games. They lead their respective groups with a maximum of 9 points. Argentina and Colombia are the weakest countries in the top 16 group with a minimum of 4 points. Both countries won one game and drew one game.

The mean of the points accumulated by the 16 countries is 6.1875 with a variance of 2.695833. The mean suggests that most of the countries that qualified won a minimum of two games. The high variance of 2.695833 shows the inconsistency in the performance between the countries.

I have decided to color code the countries according to the continents they belong in to illustrate the breakdown of the countries participating. The countries in the knock-out stage are predominantly European countries while Japan is the only Asian country.

Past results are no indicator of future performance. This is true for most things in life especially sports where there is always room for upsets and deviations.

I am, however, going to show the past performance of the top 16 clubs to show how the countries have fared in past world cups for the second visualization with the Darjeeling1 palette.

Data Frame for Visualization 2
Past Successes in World Cup based on Qualifications and Tournaments Won

All the top 16 countries have previously participated in a world cup. Only 6 of the 16 countries have previously won a world cup. Most of the European countries have participated in at least 10 cups.

Brazil has the strongest record as it has participated in all 20 world cups and won 5 world cups, the most of any country.

Thanks for reading. I am still learning about R and data visualizations so please feel free to message me if you have any feedback! :)

--

--