Visualizing the 2018 World Cup

Overview

I researched, designed, and developed an interactive data visualization for the FIFA 2018 World Cup, highlighting team competition and play patterns. Our user personas include soccer fans, coaches, and sports journalists. Play with the live version here.

Time

Aug–Dec 2018

People

Jialin Ma
Huan Deng
Me

Developer
User Researcher
UI Designer & Developer

Soccer Hero Image: isometric rendering of desktop computer with project interface designs

The Final Visualization

My team and I used d3.js to implement my designs, which contain a soccer field diagram that visualizes player movement and a slider that chronologically represents the whole match. The user can look at all player movements, or track the movement of a specific player.

Moving a time window on the slider visualizes all player movement in the match
With a specific player selected, the moving time window only shows his movement in the match
Impact

This project was selected as the best student project in our cohort. It also helped my teammates and myself get offers from places like Airbnb, MIT Labs, and Frog.

Understanding Data

A single match already generates too much data for analysis. To help picking which data variables were valuable for our users, we decided to look at the three different personas: soccer fans, coaches, and sports journalists.

Persona 1

Soccer Fans

  • Want to know a team’s lineup, formation, rival, and attack rhythm
  • Frustrated by the effort required to gather team tactics information from scattered sources
Persona 2

Soccer Coaches

  • Want to know player performance and the other team’s tactics
  • Frustrated by difficulty in distilling essential tactical information from raw data and messy notes
Persona 3

Sports Journalists

  • Want to find dramatic moments and/or trace particular star players’ performance in a match
  • But have no time to go through a whole match

Choosing Data Variables

With the three personas’ goals and frustrations in mind, we came up with an astronomical amount of data variables and spent quite some time pondering which ones to select for our visualization (a reduced version below).

All Variables Chosen Variables Only
Period
Possession team
Duration
Tactics Lineup
Formation
Player position
Player
Under pressure
Team
Event type Block Offensive
Deflection
Save block
Out
Dribble Offensive
Deflection
Substitution Off
On ID
Name
Foul committed No
Yes Offensive
Not offensive
Ball recovery Offensive
Recovery failure
Pass Recipient
Angle
Height
Length
End location
Assisted shot ID
Pass backheel
Deflected
Miscommunication
Shot assist
Goal assist
Shot Key pass ID
First time
Follows dribble
Deadball
End location
Open goal
Redirect
Freeze frame
Event location
Timestamp
Related events

Picking Visualization Types

Having chosen data variables, the next step was to give them form. We categorized the variables into three big themes and came up with a rough idea of what best represented them:

Variable Category Data Visualization Type
Player
Shows player communication
Network Visualization
Time
Allows the user to filter events
Time Slider
Location
Displays real-time player position
Location Visualization

The time slider was easy, but what type of diagram should we actually choose for the network and the location visualizations? We looked at different types of diagrams and compared their pros and cons.

Network – Arc

Illustration of an arc diagram
Pros

Looks cool when there’re a lot of connections

Cons

Not space-efficient enough to show all 22 players

Network – Alluvial

Illustration of an alluvial diagram
Pros

Space-efficient enough to show all 22 players

Cons

Form is more suitable to display rivalry than collaboration

Network – Matrix

Illustration of a matrix
Pros

Looks cool and space-efficient too

Cons

Data points start getting repetitive fast, making it hard to read and distill insights

Network – Chord 

Illustration of a chord diagram
Pros

Looks cool, space-efficient, provides overview, and can still show player interactions

Cons

Not really

Location – Heat Map

Illustration of a heat map
Pros

Maps event frequencies to location well

Cons

Doesn’t allow discerning single player movement or events

Location – A Literal Soccer Field 

Illustration of a soccer field
Pros

Provides overview of event frequencies but still capable of showing single player movements or events

Cons

Not really

We decided to go with the chord and the actual soccer field diagram, as they didn’t have significant shortcomings like the others.

First Attempt

Though we had chosen the appropriate types of visualization, it was still no small feat to combine them in a sensible way. We quickly made our first design:

During critique, we realized that it was too narrowly optimized for showing individual player locations, and was unfit for displaying other metrics like team tactics. It was also formally disjunctive: the two orbs on the right and the soccer field on the left weren’t visually connected, making the reader spend too much effort to understand it.

(Re)introducing the Time Slider

To establish a clearer connection between the teams and the soccer field diagram, we brainstormed and decided to replace the orbs with the time slider, which was originally on top of the soccer field.

Not long after we made the decision, we also stumbled upon a small piece of UI on Airbnb that visualizes housing prices according to a user-selected time frame. This immediately inspired us to do the same to our time slider by adding pass frequency.

This piece of data visualization on Airbnb allows the user to see real-time price trends by dragging the slider.
We spent an entire sleepless night intensely debating solutions.

Pass frequency turned out to be a powerful dimension that immediately transcended the time slider, pulling the two pieces of visualization together. It alone could help users deduce team tactics. When combined with time, it naturally guided the viewer’s focus. More fantastically, on an overall level it established Brush-and-Linking, which allows the user to truly see the same match in different yet simultaneous perspectives.

The finalized mockup

Addressing the Hair of Strings Problem

With the time slider taken care of, we moved on to implementing the soccer field diagram. However, showing player positions throughout a match presented the classic Hair of Strings problem: data is so dense that the visualization became almost impossible to read. We had a long debate, and arrived at the decision to average each player’s positions using a set moving timeframe.

Real-Time Position
If we do nothing and shows unaltered real-time player position, the “hair of strings” problem is very much present.
Position by Line-up
Grouping player positions by line-up can solve the Hair of Strings problem and clearly shows team tactics, but is incapable of showing players’ changing roles.
Averaged Position 
Averaged player positions reduces the “hair of string” problem, clearly shows team tactics, and still allows the user to discern individual player’s roles.

A/B Testing the Navigation Portal

In designing the navigation portal, I analyzed the pros and cons on traditional tournament views and dropdown lists. These two forms have significant flaws: they make precise-searching a match difficult, and often show repetitive, redundant information. I used chord diagrams, which allow quick precise-searching and show less redundant information. I specifically A/B tested for arranging taems in alphabetical order or group order, each of which has specific advantages and disadvantages.

Alphabetical Order
Group Order
Alphabetical order makes it easier to precise-search a specific team, while group order resembles the tournament layout that’s already familiar to soccer fans.

I conducted A/B testing with three of my soccer fan friends who have watched soccer for 6 months, 3 years, and 10 years, corresponding to the three user personas described above. They were asked to find the Argentina-France and the Spain-Portugal matches, measuring completion time and verbal preferences. At the end, although the completion time was roughly the same on both versions, they all preferred the group order version for its resemblance to the tournament layout.

Live Site with Real Data

To demonstrate how the finalized tool would be used in context, below is the Spain-Russia quarter final. Even without seeing the legends, soccer fans or coaches can still easily tell Spain (yellow) from Russia (blue) because of the consecutive and short distribution chart over the time slider together with long possession time.

Overview
Spain is known for its tiki-taka tactic style: constant, short, and frequent passes until approaching the goal. We can clearly see this in the frequent short spikes on the time slider’s pass frequency visualization.
The user can also zoom into a specific player, shown below Andrés Iniesta Luján. Honored “the brain of the team”, the midfielder organizes attacks and executes tactics. Since he is the hub of passes, he typically moves around middle field.

Play with the live version here.

Conclusion

This project made me think much deeper than before about the pros and cons of specific types of data visualizations. More importantly, it taught me an important lesson about effectively communicating with developers to respond to technical reality, schedule pressures, and how designs would look like when animated with real data.

Next Project Scan to Pay → Frog × UnionPay