OK, so you know the basics of data viz. But do you know how to create a great visualization? Can your data viz be digested in 5 seconds or less? This webinar will show you the essential tricks to shape data so that people instinctively "get it."
Here’s a little summary of some of the subjects we cover in this webinar
While there are at least 24 techniques for creating good visualizations, and this video focuses on the techniques that most market researchers don't know, including:
Download the ebook: How to Create Meaningful, Memorable, Instantly Understandable Visualizations
Thank you for joining this, the second in our two webinar series on data visualization.
It's a stand alone webinar, so don't worry if you missed the first one. You can watch it on our website.
The (3 to 5) 5 second rule
When we look at a visualization we give it the benefit of the doubt. This benefit lasts for 3 to 5 seconds. Once that time's past. We're bored. We move on. The chance to communicate has been lost.
An artist friend tells me this is not great art. It's a cartoon. But, in only a matter of seconds we get it. It talks to us. We are captivated. We can extract meaning from it.
But this? It looks like bricks. I think this is why most people don't like modern art. They can't instinctively extract meaning from it. It's too hard. They just move on.
Why is that? Let's do some thought experiments.
You have five seconds
You have 5 seconds to remember what is on the next slide.
What did it say?
Can you remember?
You have 5 seconds
Let's try it again. You've got 5 seconds.
What was the word?
If you are of a certain age, you found the second word much easy to remember, as it was a word from childhood. Even if you don’t know your mary poppins, there’s a good chance you can remember some of the word, as it was made up of words you know.
Our brains find it easier to read and understand things that are familiar.
Successful visualizations are ones that are in some sense familiar.
As I mentioned in the previous webinar, there are at least 24 techniques for creating good visualizations.
Today I am going to explore a small set of the more advanced techniques. I'm focusing on the techniques that I find most market researchers don't know.
The simplest technique is to create something visually striking. If it's dramatic we can remember it.
This is fun. But, it's not really great. It makes you remember Octopuses with orange hair, rather than the story in the data.
This one tries to grab attention in a different way. It uses a design inspired by the subject matter.
I think it's beautiful. This makes it memorable.
But, I actually think it's a poor visualization.
When I look at it I see stairs. Stairs going down. It inadvertently connotes the opposite reading by my brain than it is trying to tell..
The use of the incomplete number on bottom bar accentuates this, as the visualization has failed to use redundant encoding.
Iraq's blood toll
This one's a bit hard to read, but it 's a lot cleverer than the one before. Again, it uses the simple technique of a a visual that's tied to the subject matter.
But, it does something a bit cleverer than that as well. The basic shape of this upside down column chart is forming the type of pattern we get when fluids, like blood or paint, spill over the edge of the container.
So, the smart thing it's done is relate the pattern in the data to a pattern we know.
It's not just attracting attention. It's a nemonic of sorts.
In terms of design quality, this one's a bit paint by numbers. A wallet and the color of money are used to relate to the subject matter.
But, at a deeper level it's doing something cleverer. The core bit of the visualization is an idea we are familiar with.
A drop in the ocean.
A needle in a haystack.
A speck of dust.
It emphasizes that the initial investment is dwarfed at a monumental level by the outcome.
The examples we've looked at here are all rely on having a creative spark. In the rest of this webinar I will be exploring tried and true techniques that are easy to apply and don't rely on you discovering the artist within.
Banking to 45 degrees
Banking to 45 degrees is one of my favorite techniques.
A case study. Not market research data, but too beautiful not to show.
Dark spots appear on the sun. Sun spots.. They may last for a few days or months.
This data set tracks the number of sun spots per month.
I've got 3,177 data points .
I'll plot them as an area chart..
Insert > Area chart
Output in pages: sunspot.month
What can you see?
First, it oscillates, going up and down every 11 years.
Second, there's a bigger oscillation in the trend data, which goes down from 1800 to 1820, then down again in 1880s and so on.
What else can you see?
OK. Now for the magic. I'm going to shrink the chart.
What can you see now? Squint. It's worth it.
That's right. The sun spots increase at a much faster rate than they decrease.
Start at bump two to the left of 1800.
See here, it spikes up. Then, at more gradually decreases.
This is a common pattern.
Banking to 45 degrees
It turns out if we change the height and width so that the average of the data we want to examine is about 45 degrees, we create a visualization that's much easier to see.
No need to be too mathematical. Your eye is good enough.
Attitudes to political institutions
Here's a second example. We're looking at some survey data showing attitudes toward various intuitions.
Which is the better chart?
Clearly the one on the right.
In it, I've done four things. The data has been banked to 45 degrees, as just discussed.
I've also used lots of grey, emphasized the key pattern I'm interested in, and removed the amount of eye lookups by getting rid of the legend.
Note the really cool thing about banking to 45 degrees.
It leads to smaller charts. That is, the chart no longer takes up nearly as much room. So, we can add in more commentary or other insights if we wish.
Why does the banking work? Here's my theory.
Lean back. What do you see?
Lean in. What do you see?
Lean in. What do you see? A gremlin. Look at the right, there's a toucan as well with a big claw.
If we are 50 cm…
For optimal viewing, we want an image to be in an angle from our eyes of about 4 to 6 degrees from our retina. This means that if we are viewing from about 20 inches/50cm away, our ideal image is only about 2 inches/5cm high.
This explains a bit why banking works. We are wanting key features of the data to be visible in a small area, about 2 inches by 2 inches. Resizing things so that they are at 45 degrees most efficiently achieves this.
This is my all time favorite technique. I even wrote the wikipedia page!
Remember the big table from last week
I showed this table in the previous webinar. It's got too many columns to view on the page.
But, as I showed you, if we inserted a heatmap, we could easily visualize everything.
Insert > Visualization > Heatmap
Last week it had a really clear pattern. This week, not so clear.
It looks like a pretty random pattern of tiles. What's the story?
The secret to a large table like this is to rearrange the rows and columns to make the patterns clearer.
The heatmap has an inbuilt tool which uses hiearchical cluster analysis to order the rows and columns.
Chart > Row sorting or dendrogram: Dendrogram
Column sorting… > Dendrogram
This does an OK job. We've got a couple of blocks
Looking at the block at the top left, it's grouped together the four sugar free products.
We can see that they are all relatively strong in terms of Weight conscious and health conscious.
And, looking at the bottom row, we can see tha tCoke and Pepsi are pretty similar, and the skew to being traditional, order, reliable.
So, moving the rows and columns around does make the patterns easier to see.
But, we can do better!
I'm going to manually move the rows and columns around and create an even better visualization. Along the way, you'll get a better feel for what diagonalization is about.
Chart > Row sorting or dendrogram: None
Column sorting… > None
Step 1 is to remove any junk from the table. I'm going to get rid of the NET and None of these rows.
We will start by finding the biggest number on the table.
That's Coke and Traditional.
Then, we are going to sort the rows from highest to lowest on this.
Then we sort the columns
Manually move columns to something like this, explaining as you go:
So, we are moving rows around, and we are doing so until we've created a diagonal pattern in the data.
At a very basic level, the diagonal line is a pattern, which our brain can understand.
Then, once we see the pattern, we can attribute meaning to it.
At the top left, Coke and Pepsi are very similar.
What do they have in common? The skew more to being traditional, older, reliable, honest.
Coke's darker so stronger than Pepsi. But, they're pretty similar.
Pepsi Max and Coke Zero are similar. But, Pepsi Max is a bit more masculine, confident and tough. It's closer to Coke and Pepsi.
Coke Zero is much more oriented towards weight and health conscious.
Diet Coke and Diet Pepsi are really strong with weight and health, but a bit more feminine and innocent.
Simplifying the data
When we bank and diagnoanlize data, we make it so that the our brains find the patterns simpler to spot.
Another way of simplifying is to simplify the data.
This is historically the bread and butter of market research, so don't worry I won't bore you by talking about top 2 boxes and other ways of aggregating data.
Similarly, I won't show you correspondence analysis again, but will point out it is also simplifying data, so the same strategy.
Let's go back to our big table and simplify it further.
Remember the big table
Looking at the heatmap, we can see that there are lots of columns that aren't really adding much to the story.
Let's just get rid of all the ones that aren't adding much.
Similarly, on the right, Innocent through Wholesome show little in the way of differences between brands
And, looking in the middle, we can remove Urban through individualistic
And, let's get rid of Imaginative through to Unconventional
Heatmaps can be great, but, when you are comparing brands, they're rarely the best.
I tend to prefer to use market maps, such as created by correspondence analysis, which we talked about last week.
Or, small multiples of radar charts.
Insert > Visualization > Radar Charts
Show Small multiples
Inputs > Data manipulation > Switch rows and columns
What I love about small multiples of radar charts is that we convert all these numbers into shapes, and they are shapes that our brains are good at understanding. Looking at this one, we can easily see that Coke and Pepsi have the same shape.
I'm going to reorder these so that similar shapes are in columns. Note that I'm really diagnonizing again:
Order of panels: 1, 3, 5, 2, 4, 6
Diet Coke and Diet Pepsi have basically the same shape.
Let's put the labels on.
Chart > DATA LABELS > Show data labels
Oh. That's a bit of a mess. That's the challenge with small multiples of radar charts. Sometimes you get killed by the labels.
Another option is palm trees
Inputs > Chart Type > Palm
DATA MANIPULATION > Switch rows and columns
You can see here that it's pretty easy to compare the shapes again. Coke and Pepsi are the same, shape, but Coke's stronger in everything, which is why it's higher.
See also that Pepsi Max and Coke Zero have similar shapes.
As do Diet Coke and Coke Zero.
These work well interactively.
Do some hovering.
But, if we need to go with PowerPoint, we could go with bars.
But, bars are boring.
And, remember I said last week how making plots symmetrical makes them better?
Let's go with a pyramid or spindles.
And let's tidy it up.
And lets swap around the axes
Inputs: DATA MANIPULATION > Swap rows and columns
Is it pretty?
Not really. But, it's near impossible to fail to spot the key patterns in the data now.
Smoothers that are better than moving averages
Last week I talked about how small multiple are the number 1 data visualization invention that market researchers don't use.
This week I'm going to introduce you to the number 2: modern smoothers.
Here' I've got some data on customer churn.
Lets plot this data as columns
Insert > Visualization > Column Chart
It jumps around a lot. 99 out of 100 market researchers then go and apply a moving average.
Chart > Trend > Moving average
Displayr's defualting to the 3 monyh moving average. This is wiggling around too much.
Let's try 6 months
But, look, the moving average actually does a bad job. Look in June 2019. A huge spike. The moving average shows a small blip up.
According moving average, the data peaked in November 2019. But, look at the actual data. The high points are 2 and 5 months earlier.
What's going on?
The sad truth is that traditional moving averages are just dumb. Statisticians have known for more than 100 years that they are dumb. This is why whenever you see government data or anything done by a good statistician, they don't use moving averages.
There are lots of alternatives.
Three of the popular ones are in Displayr and Q.
One of them is called Locally Weighted Exponential scatterplot smoothing, or loess for short.
It's doing a much better job. It works out that the June 2019 peak was an aberation. And, it picks that since September things have been getting better.
Another one, with a cooler name, is Friedman's super smoother
Line of best fit: Friedman's super smoother
Or, you can have a cubic spline.
Which is best? Just like with choosing wether to do a 3 or 6 month moving average, it comes down to the story you want to tell.
These techniques are always better than moving averages. Always is a strong word. And I mean it. If your goal is accurately summarize the data, these techniques are provably always better than a moving average.
Nothing comes for free however.
The math is hard. You need to point, click and trust. You are unlikely to ever understand the math. And, unlike a moving average, when you get new data, the historical line will update, so they're not great for setting performance target for bonuses.
Adding averages or other norrmative information to a chart is another way of making things easier to spot.
Small multiples of radar charts
Here's a small multiples of radar charts where we can read the labels. Yeh!
But, the shapes are all pretty similar, and the main thing that jumps out at us is that the sizes differ.
If we super-impose the average on the charts, it all becomes much clearer
We can now easily see that Aldi is, by far, the big price fighter.Woolworths and Coles are basically identically, and IGA and Foodland, have the same positioning, but weaker, than Woolworths and Coles.
Desnity plots like this are often used to show the distribution of a variable. This one's showing the time it takes for somebody to purchase our software after they start their trial.
What would you say is the median?
Heated density plot
Here I've used heatmap shading to show make ti clear that the median in yellow. It both looks prettier, and you leaves with a more informed view.
We can easily see that the median time s about 50 days.
Everything I have shown so far is really a part of a single basic idea: we want to force data and visualizations to correspond to shapes that people can recognize.
And, we want to exaggerate the shape if at all possible.
To use some jargon: We want to supernormalize the visualization.
The herring gull's world
Dutch Nobel Laurette Niko Tinbergen did a cruel but fascinating study. He noticed that birds tended to instinctively peck at their mothers beaks as soon as they hatched. So, he took away their mothers to see what they would do, and showed them a cardboard cut-out. They pecked at it.
He then showed them a two beaked monster. They pecked at it.
A thin red…
The fascinating finding was that they pecked more ferociously at a red rod with three white lines than a plaster model of their mother’s head.
Because the instincts in the brain have evolved to having a simplified concept of what to peck at, and the supernormal stimulus of the red bar better taps into this.
An Excel-style line chart
When we see a line chart like this, it is not something we have evolved instincts to interpret.
It just looks like spaghetti
We have not evolved to look for patterns in spaghetti.
Which is why small multiples of line and area charts are so much better, as shown last week.
We instinctively understand space
We instinctively understand space.
This is why correspondence analysis maps work so well.
We instinctively understand height
We have evolved to compare the height of things. This is why column charts are so good.
We instinctively understand proportinonality
We can all tell how much of a cake has been eaten. This is why no matter how many people tell us that pie charts are rubbish, we still use them anyway. We know they work.
We instinctively understand line and area charts
Through our history we have needed to judge the slopes of mountains and hills, so it is no surprise we find line and area charts easy to interpret.
We understand footprints for the same reason.
Our forebearers were hunter gatherers, and so we are good at disambiguating shapes in leaves.
And that's a reason why small multiples work. We are good at comparing shapes. So long as they are recognizable. And not too complicated.
We instinctively understand heat
We instinctively understand heat.
Treemap with heatmap shading
Which is why typhoid jumps out at us on this visualization, even though its font is so small..
Treemap with unnatural heatmap shading
But, this grey, red, green scale is not natural, which makes this visualization hard to process.
This is a chord diagram. Oh so cool. But, not so natural to read.
It looks like the Great Pit of Carkoon from Return of the Jedi. Fascinating Yes, but not somewhere we expect to find a pattern.
This is often described as the greatest visualization of all time. But, it needs to be explained to you before you can understand it. How many people bother?
This is awesome. Not pretty. But awesome. The small multiples have been lined up to make patterns clear. If I say to you, which industries first declined and then picked themselves up, you can quickly work it out
Universal commercial history
This is my favorite visualization of all time. Each area shows the size of an economy, from 2000BC through to 1804. Note the middle of the map. The dark ages, when only Constantinople was around.
Look at what the war of independnece did to the US economy.
Note it's diagonalized.
Note that the curves are baked t the extent possible.
Not it's using small multiples.
Over the two weeks we've covered most of the 24 tecniques here.
There's a book
As mentioned earlier, you can download our data visualization ebook for more detail.