Displayr's driver analysis makes it both easy and fast to perform driver analysis. This post gives an overview of the key features in Displayr designed for performing driver analysis (i.e., working out the relative importance of predictors of brand performance, customer satisfaction, and NPS). This post describes the various driver analysis methods available, stacking, options for missing data, in-built diagnostics for model checking and improvement, and how to create outputs from the driver analysis.

For more detail about what method to use when, see our driver analysis webinar and eBook.

## Choice of driver analysis method

All the widely used methods for driver analysis are available in Displayr. They are accessed via the same menu option, so you can toggle between them.

• Correlations: Insert > Regression > Driver analysisÂ and setÂ OutputÂ toÂ Correlation. This method is appropriate when you are unconcerned about correlations between predictor variables.
• Jaccard coefficient/index:Â Insert > Regression > Driver analysisÂ and setÂ OutputÂ toÂ Jaccard Coefficient (note that Jaccard Coefficient is only available whenÂ TypeÂ is set toÂ Linear). This is similar to correlation, except it is only appropriate when both the predictor and outcome variables are binary.
• Generalized Linear Models (GLMs), such as linear regression and binary logit, and the related quasi-GLM methods (e.g., ordered logit): Insert > Regression > Linear, Binary Logit, Ordered Logit, etc.Â These address correlations between the predictor variables, and each of the different methods is designed for different distributions of the outcome variable (e.g., linear for numeric outcome, binary logit for two-category outcome, ordered logit for ordinal output).
• Shapley Regression:Â Insert > Regression > Driver analysisÂ and setÂ OutputÂ toÂ Shapley Regression (note that Shapley Regression is only available whenÂ TypeÂ is set toÂ Linear). This a regularized regression, designed for situations where linear regression results are unreliable due to high correlations between predictors.
• Johnson's relative weight:Â Insert > Regression > Driver analysis. Note that this appears asÂ OutputÂ being set to Relative Importance Analysis. As with Shapley Regression, this is a regularized regression, but unlike Shapley it is applicable to allÂ TypeÂ settings (e.g., ordered logit, binary logit).

## Stacking

Often driver analysis is performed using data for multiple brands at the same time. Traditionally this is addressed by creating a new data file thatÂ stacksÂ the data from each brand on top of each other (see What is Data Stacking?). However, when performing driver analysis in Displayr, the data can be automatically stacked by:

• Checking the Stack data option.
• Selecting variable sets for Outcome and Predictors that contains multiple variables (for Predictors these need to be set as Binary - Grid or Number - Grid).

## Missing data

By default, all the driver analysis methods exclude all cases with missing data from their analysis (this occurs after any stacking has been performed). However, there are two additional Missing data options that can be relevant:

• If usingÂ Correlation, Jaccard Coefficient, or Linear Regression, you can select Use partial data (pairwise correlations), in which case the data is analyzed using all the available data. Even when not all the predictors have data, the partial information is used for each case.
• If usingÂ Shapley Regression,Â Johnson's Relative Weights (Relative Importance Analysis) or any of the GLMs and quasi-GLMs,Â Multiple imputationÂ can be used. This is generally the best method for dealing with missing data, except for situations theÂ Dummy variable adjustmentÂ is appropriate.
• If usingÂ Shapley Regression,Â Johnson's Relative Weights (Relative Importance Analysis) or any of the GLMs and quasi-GLMs,Â Dummy variable adjustment can be used. This method is appropriate when the data is missing because it cannot exist. For example, if the predictors are ratings of satisfaction with a bank's call centers, branches, and web site, if data is missing for people that have not attended any of these, then this setting is appropriate. By contrast, if the data is missing because the person didn't feel like providing an answer, multiple imputation is preferable.

## Diagnostics for model checking and improvement

A key feature of Displayr's driver analysis is that it contains many tools for automatically checking the data to see if there are problems, including VIFs and G-VIFs if there are highly correlated predictors, a test of heteroscedasticity, tests for outliers, and checks that the TypeÂ setting has been chose correctly. Where Displayr identifies an issue that is serious it will show an error and provide no warnings. In other situations it will show a warning (in orange) and provide suggestions for resolving the issue.

One particular diagnostic that sometimes stumps new users is that by default Displayr sometimes shows negative importance scores for Shapley Regression and Johnson's Relative Weights. As both methods are defined under the assumption that importance scores must be positive, the appearance of negative scores can cause some confusion. What's going on is that Displayr also performs a traditional multiple regression and shows the signs from this on the relative importance outputs as a warning for the user that the assumption of positive importance may not be correct. This can be turned off by checkingÂ Absolute importance scores.

## Outputs

Standard output output from all but the GLMs is a table like the one below. The second column of numbers shows the selected importance metric, and the first column shows this scaled to be out of 100.