# Adding Supplementary Points to a Correspondence Analysis

### Trends of brand perceptions

An earlier post describes how you can use correspondence analysis to analyze trends. I have repeated one of the scatterplots from this earlier post below. It shows the change in people’s perceptions of technology brands from 2012 to 2017.

The plot has *Principal coordinates normalization*. This means that the distances between row points and the distances between column points are meaningful, but not necessarily the distance between row and column points. Click here for a further explanation about interpretation and normalization.

### Separating core and supplementary points

An alternative way to analyze the same data is to consider 2017 as the “ground truth” and plot 2012 as supplementary points. This means that the 2017 data determine the dimensions and axes of the map. You can add 2012 data after as supplementary points. In technical correspondence analysis terminology, the 2012 rows have zero *mass*.

We can see below that while the output shows the same themes as the first chart, it is different in the detail. *Easy to use *and *Innovative *are now closer together. We can now deduce that on the basis of 2017 data, *Easy to use *and *Innovative *have more similar meanings.

Yet another perspective is to consider 2012 the ground truth and to then plot the 2017 points as supplementary. This produces the results below where *Easy to use *and *Innovative *are further apart than in the original chart. Evidently, the association between innovation and ease of use is a more recent phenomenon.

All three charts are equally valid views of the data. They differ in their emphasis. For instance, the second chart would be most relevant for a study on the state of the technology market in 2017. In this case, the 2012 data is added for context but does not influence the positioning of the 2017 points.

Note that the first chart from the previous post is an “average” (in a strictly non-technical, hand-waving sense!) of the 2012 and 2017 charts.

### Focusing on a subset of data

The second example below is the correspondence analysis resulting from a table of 14 car models. Let’s say we wanted to study the 4 German brands. They form a line across the top from *Volkswagen* on the left, through *Audi, Mercedes* then *BMW*. The chart has *Row principal *normalization*. *This means that is it valid to compare distances between row points. It is also valid to measure the association between rows and columns by their scalar products.

We might be tempted to say that the *Volkswagen* was *Popular*, the *Audi* and *Mercedes *are *Luxury* and the *BMW X5* is *Sporty*. Before doing so, note that the total explained variance is only 53%. This means there is information hidden in the dimensions that are not plotted.

Let’s repeat the analysis, this time treating all the non-German cars as supplementary. Now we see that the *Audi A4* is very near the center of the plot. This means that it is not strongly associated with any of the characteristics. We can conclude that amongst all 14 cars the *Audi *is considered a luxury car, but amongst the German cars, it is not. Note also that the total explained variance below is now almost 97%. This means that we can be more confident about our conclusions.

There is also a close relationship between *Family* and *Sporty. *Evidently, the German cars discriminate relatively little between those characteristics.

Finally, we can check the result above by removing the supplementary points. This produces the chart below, which is the same except we can no longer see how the German cars relate to the non-Germans.

### Conclusion

You can add data to a “core” correspondence analysis as supplementary points. The advantage of supplementary points over just analyzing all the data together is that supplementary points do not influence the placement of core data points. As the name implies, they are added after the core data has determined the map. Supplementary data points are an excellent way to provide additional context to an analysis that is driven entirely by another part of the data set.

TRY IT OUT

All the analysis in this post was conducted in Displayr. Review the worked example from this post or run your own analysis by clicking through to this correspondence analysis example (just sign into Displayr first). The supplementary points are specified in the *Inputs* panel, seen on the right after clicking on any map.

The flipDimensionReduction package (available on GitHub) was used, which itself uses the ca package for correspondence analysis.

#### Author: Jake Hoare

After escaping from physics to a career in banking, then escaping from banking, I decided to go back to BASIC and study computing. This led me to rediscover artificial intelligence and data science. I now get to indulge myself at Displayr working in the Data Science team, often on machine learning.