13 February 2017 |
Using Correspondence Analysis to Find Patterns in Tables
There are lots of great visualizations designed for analyzing big quantities of data. Heatmaps, for example, are super-popular. However, when I am in a rush, my “go to” approach to analyzing big tables is almost always correspondence analysis. Although a bit more technical, it tends to get me to the key insights much faster.
A Correspondence Analysis Example
The data below shows the proportion of people to associate 15 personality attributes with 42 brands. It is too big, making it difficult readily digest.
The scatterplot below shows the results of a correspondence analysis of the same table. Correspondence analysis identifies the main relationships between the rows and columns on a table and plots them on a two-dimensional map. You could have more dimensions, but as computer screens are two-dimensional, they tend not to be so good.
Interpreting Correspondence Analysis
The chart above is much simpler to digest than the whole table. At the bottom-left, we can see that that Calvin Klein, American Express, Apple, and Lexus are Upper-class. Porsche mixes Upper-class and Daring. At the top-left, we can see that Tough is shared by Nike, Reebok, Levi’s and Michelin, which also are a bit Outdoorsy.
One key tip if you are new to correspondence analysis: the closer anything is to the middle of the map, the less distinct it is. Thus, on this map, we can see that Qantas is poorly described by any of the personality attributes. Similarly, Successful and Imaginative are personality attributes that are not good differentiators between the brands.
We can also see that a continuum of sorts is evident in the data. It goes from Upper-class and Intelligent at the bottom-left, through to Cheerful and Down-to-earth at the top-right.
As is always the case when we fit a model to data, there is no free lunch. Correspondence analysis just summarizes the data. Like many summaries, it can be superficial and at times misleading. For this reason, I always check that any key conclusions that I draw from a correspondence analysis are also clearly visible in the original data table or a heatmap.
You can explore the correspondence analysis used in this example yourself or create your own correspondence analysis here in Displayr.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.