Shapley Value regression is a technique for working out the relative importance of predictor variables in linear regression. Its principal application is to resolve a weakness of linear regression, which is that it is not reliable when predicted variables are moderately to highly correlated. Shapley Value regression is also known as Shapley regression, Shapley Value analysis, LMG, Kruskal analysis, and dominance analysis, and incremental R-squared analysis.

Worked example

The first step with Shapley Value regression is to compute linear regressions using all possible combinations of predictors, with the R-squared statistic being computed for each regression. For example, if we have three predictors -- A, B, and C -- then eight linear regressions are estimated with the following combinations of predictors (hypothetical R-squared statistics are shown in brackets):

  • No predictors (other than the intercept; R²: 0)
  • A only (R²= .4)
  • B only (R² = .2)
  • C only (R² = .1)
  • A and B (R² = .4)
  • A and C (R² = .5)
  • B and C (R² = .3)
  • A, B, and C (R² = .5)

Before looking at the computations of Shapley, take a second to think about a simple approach to computing importance, which is to compare the regressions with a single predictor. This would lead to the conclusion that A is twice as important as B, which is twice as important as C.

Back to Shapley. For each predictor we compute the average improvement that is created when adding that variable to a model. In the case of predictor A:

  • It adds .4 when added on its own (i.e., the A-only model minus the No Predictors model).
  • .4 - .2 = .2 when added to B (i.e., model A and B minus model B)
  • .5 - .1 = .4 when added to C
  • .5 - .3 = .2 when added to the regression with B and C.

Shapley regression uses a weighted average of these numbers: 1/3 (.4) + 1/6 (.2) + 1/6 (.4) + 1/3 (.2) = .3. Thus, we can say that the average effect of adding A to a model is that it improves the R-squared statistic by .3.

The logic of the weights of 1/3, 1/6, 1/6, and 1/3, is that we equally weight the regressions based on the number of possible models. That is, we have two regressions where A is used with one other variable, but only one model where A is estimated on its own, and only one where A is used with two other variables; this weighting scheme takes this into account, so that the weighting for models with a single predictor are 1/3, with two predictors are 1/3, and with 3 predictors are 1/3. For all three predictors we then have their average incremental improvement as:

  • A: .3
  • B: .1
  • C: .1

Last, we re-base these so that they add up to 100%:

  • A: 60%
  • B: 20%
  • C: 20%

This analysis has changed our conclusions about the relative importance of the variables. The earlier simple analysis found that B was twice as important as C, but the Shapley regression shows they are equally important. Why? Because whenever C is added, it increased the R-squared, whereas for B, it has no effect when A is already in the model.

Sign up for Displayr

Click here

References

Budescu, D.V. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression, Psychological bulletin 114 (3), 542.

Kruskal, William. (1987). Relative Importance by Averaging Over Orderings. American Statistician. 41. 6-10.

Lipovetsky, S. and Conklin, M. (2001). Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry, 17(4):319-330.

Lindeman R.H., Merenda P.F., Gold R.Z. (1980). Introduction to Bivariate and Multivariate Analysis. Scott, Foresman, Glenview, IL.