17 August 2017 |
Adding Supplementary Points to a Correspondence Analysis
Trends of brand perceptions
An earlier post describes how you can use correspondence analysis to analyze trends. I have repeated one of the scatterplots from this earlier post below. It shows the change in people’s perceptions of technology brands from 2012 to 2017.
The plot has Principal coordinates normalization. This means that the distances between row points and the distances between column points are meaningful, but not necessarily the distance between row and column points. Click here for a further explanation about interpretation and normalization.
Separating core and supplementary points
An alternative way to analyze the same data is to consider 2017 as the “ground truth” and plot 2012 as supplementary points. This means that the 2017 data determine the dimensions and axes of the map. You can add 2012 data after as supplementary points. In technical correspondence analysis terminology, the 2012 rows have zero mass.
We can see below that while the output shows the same themes as the first chart, it is different in the detail. Easy to use and Innovative are now closer together. We can now deduce that on the basis of 2017 data, Easy to use and Innovative have more similar meanings.
Yet another perspective is to consider 2012 the ground truth and to then plot the 2017 points as supplementary. This produces the results below where Easy to use and Innovative are further apart than in the original chart. Evidently, the association between innovation and ease of use is a more recent phenomenon.
All three charts are equally valid views of the data. They differ in their emphasis. For instance, the second chart would be most relevant for a study on the state of the technology market in 2017. In this case, the 2012 data is added for context but does not influence the positioning of the 2017 points.
Note that the first chart from the previous post is an “average” (in a strictly non-technical, hand-waving sense!) of the 2012 and 2017 charts.
Focusing on a subset of data
The second example below is the correspondence analysis resulting from a table of 14 car models. Let’s say we wanted to study the 4 German brands. They form a line across the top from Volkswagen on the left, through Audi, Mercedes then BMW. The chart has Row principal normalization. This means that is it valid to compare distances between row points. It is also valid to measure the association between rows and columns by their scalar products.
We might be tempted to say that the Volkswagen was Popular, the Audi and Mercedes are Luxury and the BMW X5 is Sporty. Before doing so, note that the total explained variance is only 53%. This means there is information hidden in the dimensions that are not plotted.
Let’s repeat the analysis, this time treating all the non-German cars as supplementary. Now we see that the Audi A4 is very near the center of the plot. This means that it is not strongly associated with any of the characteristics. We can conclude that amongst all 14 cars the Audi is considered a luxury car, but amongst the German cars, it is not. Note also that the total explained variance below is now almost 97%. This means that we can be more confident about our conclusions.
There is also a close relationship between Family and Sporty. Evidently, the German cars discriminate relatively little between those characteristics.
Finally, we can check the result above by removing the supplementary points. This produces the chart below, which is the same except we can no longer see how the German cars relate to the non-Germans.
You can add data to a “core” correspondence analysis as supplementary points. The advantage of supplementary points over just analyzing all the data together is that supplementary points do not influence the placement of core data points. As the name implies, they are added after the core data has determined the map. Supplementary data points are an excellent way to provide additional context to an analysis that is driven entirely by another part of the data set.
TRY IT OUT
All the analysis in this post was conducted in Displayr. Review the worked example from this post or run your own analysis by clicking through to this correspondence analysis example (just sign into Displayr first). The supplementary points are specified in the Inputs panel, seen on the right after clicking on any map.
The flipDimensionReduction package (available on GitHub) was used, which itself uses the ca package for correspondence analysis.
Author: Jake Hoare
After escaping from physics to a career in banking, then escaping from banking, I decided to go back to BASIC and study computing. This led me to rediscover artificial intelligence and data science. I now get to indulge myself at Displayr working in the Data Science team, sometimes on machine learning.