31 July 2017 |

# 8 Tips for Interpreting R-Squared

Hopefully if you have landed on this post you have a basic idea of what the *R-Squared* statistic is. The R-Squared statistic is a number between 0 and 1, or, 0% and 100%, that quantifies the variance explained in a statistical model. It is the same thing as *r-squared, R-square, *the *coefficient of determination, variance explained*, the *squared correlation, r ^{2}*, and

*R*.

^{2}We get quite a few questions about its interpretation from users of Q and Displayr, so I am taking the opportunity to answer the most common questions as a series of tips for using *R ^{2}*.

## 1. Don’t conclude a model is “good” based on the R-squared

The basic mistake that people make with *R-squared* is to try and work out if a model is “good” or not, based on its value. There are two flavors of this question:

- “My
*R-Squared*is 75%. Is this good?” - “My
*R-Squared*is only 20%; I was told that it needs to be 90%”.

The problem with both of these questions it that it is just a bit silly to work out if a model is good or not based on the value of the *R-Squared* statistic. Sure it would be great if you could check a model by looking at its *R-Squared, *but it makes no sense to do so. Most of the rest of the post explains why.

I will point out a caveat to this tip. It is pretty common to develop rules of thumb. For example, in driver analysis, models often have *R-Squared* values of around 0.20 to 0.40. But, keep in mind, that even if you are doing a driver analysis, having an *R-Squared *in this range, or better, does not make the model valid.

## 2. Use R-Squared to work out overall fit

Sometimes people take point 1 a bit further, and suggest that *R-Squared *is always bad. Or, that it is bad for special types of models (e.g., don’t use R-Squared for non-linear models). This is a case of throwing the baby out with the bath water. There are quite a few caveats, but as a general statistic for summarizing the strength of a relationship, *R-Squared* is awesome. All else being equal, a model that explained 95% of the variance is likely to be a whole lot better than one that explains 5% of the variance, and likely will produce much, much better predictions.

Of course, often all is not equal, so read on.

## 3. Plot the data

When interpreting the *R-Squared* it is almost always a good idea to plot the data. That is, create a plot of the observed data and the predicted values of the data. This can reveal situations where *R-Squared* is highly misleading. For example, if the observed and predicted values do not appear as a cloud formed around a straight line, then the *R-Squared*, and the model itself, will be misleading. Similarly, outliers can make the *R-Squared* statistic be exaggerated or be much smaller than is appropriate to describe the overall pattern in the data.

## 4. Be very afraid if you see a value of 0.9 or more

In 25 years of building models, of everything from retail IPOs through to drug testing, I have never seen a good model with an *R-Squared* of more than 0.9. Such high values always mean that something is wrong, usually seriously wrong.

## 5. Take context into account

There are a lot of different factors that can cause the value to be high or low. This makes it dangerous to conclude that a model is good or bad based solely on the *R-Squared*. For example:

- When your predictor or outcome variables are categorical (e.g., rating scales) or counts, the
*R-Squared*will typically be lower than with truly numeric data. - The more true noise in the data, the lower the
*R-Squared*. For example, if building models based on stated preferences of people, there is a lot of noise so a high*R-Squared*is hard to achieve. By contrast, models of astronomical phenomena are the other way around. - When you have more observations, the
*R-Squared*gets lower. - When you have more predictor variables, the
*R-Squared*gets higher (this is offset by the previous point; the lower the ratio of observations to predictor variables, the higher the*R-Squared*). - If your data is not a simple random sample the
*R-Squared*can be inflated. For example, consider models based on time series data or geographic data. These are rarely simple random samples, and tend to get much higher*R-Squared*statistics. - When your model excludes variables that are obviously important, the
*R-Squared*will necessarily be small. For example, if you have a model looking at how brand imagery drives brand preference, and your model ignores practical things like price, distribution, flavor, and quality, the*R-Squared*is inevitably going to be small even if your model is great. - Models based on aggregated data (e.g., state-level data) have much higher
*R-Squared*statistics than those based on case-level data.

## 6. Think long and hard about causality

For the *R-Squared* to have any meaning at all in the vast majority of applications it is important that the model says something useful about causality. Consider, for example, a model that predicts adults’ height based on their weight and gets an *R-Squared* of 0.49. Is such a model meaningful? It depends on the context. But, for most contexts the model is unlikely to be useful. The implication, that if we get adults to eat more they will get taller, is rarely true.

But, consider a model that predicts tomorrow’s exchange rate and has an *R-Squared* of 0.01. If the model is sensible in terms of its causal assumptions, then there is a good chance that this model is accurate enough to make its owner very rich.

## 7. Don’t use R-Squared to compare models

A natural thing to do is to compare models based on their *R-Squared *statistics. If one model has a higher *R-Squared*, surely it is better? This is, as a pretty general rule, an awful idea. There are two different reasons for this:

- In many situations the
*R-Squared*is misleading when compared across models. Examples include comparing a model based on aggregated data with one based on disaggregate data, or models where the variables are being transformed. - Even in situations where the
*R-Squared*may be meaningful, there are always better tools for comparing models. These include*F-Tests*,*Bayes’ Factors*,*Information Criteria*, and out-of-sample predictive accuracy.

## 8. Don’t interpret pseudo R-Squared statistics as explaining variance

Technically, *R-Squared* is only valid for linear models with numeric data. While I find it useful for lots of other types of models, it is rare to see it reported for models using categorical outcome variables (e.g., logit models). Many *pseudo R-squared *models have been developed for such purposes (e.g., *McFadden’s Rho, Cox & Snell*). These are designed to mimic *R-Squared* in that 0 means a bad model and 1 means a great model. However, they are fundamentally different from *R-Squared* in that they do not indicate the variance explained by a model. For example, if McFadden’s Rho is 50%, even with linear data, this does not mean that it explains 50% of the variance. No such interpretation is possible. In particular, many of these statistics can never ever get to a value of 1.0, even if the model is “perfect”.

Do you have any other tips? Tell me and I will add them to the post.

#### Author: Tim Bock

Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.

Hi Tim, thanks for the inspiring post. It may be worth mentioning that even a R^2 of 1 doesn’t mean a prediction *equals* an observed value (neccessarily). A perfect correlation remains, as it is wellknown, insensitive to the means and the variances of the involved variables. Hence, perfect correlation (or R^2) only captures (linear) patterns, not identical values.

Second, and an implication of the note above, is that R^2 is only concerned about *relative* variance reduction (in comparison to the null model), not absolute. So, depending on the variance, a model with low R^2 may have lesser *absolute* prediction error than a model with a higher R^2.

Very useful clarifications.

Thanks,

Tim