Ready to improve the accuracy of your choice models? This is particularly useful if you're wanting to save money in data collection. Today, I'll show you how to include covariates such as behavioral data, attitudes, and demographics in a discrete choice (CBC) model using Hierarchical Bayes.

What are choice models and covariates?

Let's say you're conducting a stated preference choice experiment. In this experiment, your respondents are asked a series of questions and must make choices between alternatives with different attributes. For example, in a study on sports drinks, the alternatives may be different brands and the attributes may be price, bottle size, flavor, and sugar content. The mathematical model used to analyze these types of experiments is called a choice model. It is also commonly called a choice-based conjoint analysis.

The classic choice model does not include additional information about the respondents such as their income or age. In this post, I'll show you how we can modify the model to include this additional information. We'll be using Hierarchical Bayes (HB) to incorporate the covariates and to fit the model.

I'll discuss why we may want to include covariates in our choice models and explain how this can be done in an HB framework. I'll then demonstrate the approach using a discrete choice study examining fast food preferences.

bitmoji of Matt asking "what's for lunch?"

A quick introduction to Hierarchical Bayes analysis with choice models

Before we begin adding respondent-specific variables (like demographics) to our discrete choice analyses, we need to quickly introduce Hierarchical Bayes (HB). You can read more about using Hierarchical Bayes for MaxDiff here.

Hierarchical Bayes is a powerful approach for analyzing data. It allows us to incorporate prior beliefs about model parameters, such as the part-worth means and variances. Using state-of-the-art Monte Carlo methods, we can easily model the behavior of a market. In contrast to other approaches, we can obtain samples that teach us about the entire distribution of the part-worths and other model parameters rather than only point estimates.

The 'hierarchical' in Hierarchical Bayes refers to the multilevel structure of the model. Parameters at each level can have their own distinct distributions. At the individual level, we model the within-respondent variation and specify a distribution for the individual part-worths. At the population level, we pool information across respondents and describe how part-worths vary in the entire population.

Why include respondent-specific covariates?

Recent advances in computing have made it possible to include respondent-specific covariates in HB choice models. There are several reasons why we may want to do this in practice. It is possible that the information from additional covariates improves the estimates of the part-worths. This is more likely to be the case for surveys where respondents are asked fewer questions each and we have less information on each individual. Additionally, when respondents are segmented, we may be worried about the estimates for one segment being biased by another segment. Another concern is that HB may shrink the segment means overly close to each other. This is especially problematic if sample sizes vary greatly between segments.

How to include covariates in the model (skip this section if you don't like math)

In the usual HB choice model, we model the part-worths for the ith respondent as βi ~ N(μ, ∑). Note that the mean and covariance parameters μ and ∑ do not depend on i, and are the same for each respondent in the population. The simplest way to include respondent-specific covariates in the model is to modify μ so that it depends on the respondent's covariates.

We do this by modifying the model for the part-worths to βi ~N(Θxi, ∑), where xi is a vector of known covariate values for the ith respondent and Θ is a matrix of unknown regression coefficients.  Each row of Θ is given a multivariate normal prior and the covariance matrix, ∑, is decomposed into a correlation matrix and vector of scales, which each receive their own priors.

If you know what you're doing, you can fit this model in any of R, Sawtooth, Q, or Displayr, among others.

Practical example

To demonstrate the approach, I will use data from a choice experiment involving preferences for cruise vacations from the 2016 Sawtooth CBC Prediction Competition. 600 respondents were asked 15 questions each involving four alternative cruise vacation options. Each option varied in a number of attributes including price per person per day, destination, cruise line, number of days, room type, and number of amenities. The questions asked for one version of the design are shown below using a preview from Displayr.

To fit a choice model using the collected responses in Displayr, we select Insert > More > Choice Modeling > Hierarchical Bayes from the Ribbon at the top. Displayr fits the model using the No-U-Turn sampler implemented in stan, state-of-the-art software for fitting Bayesian models such as ours.  The software allows us to quickly and efficiently estimate our model without having to worry (much) about selecting tuning parameters (which are frequently a major hassle in Bayesian computation and machine learning).  Once the model is fit, Displayr also provides a number of features for visualizing the results and diagnosing any issues with the model fit. 

A video showing all the steps to fit the model to a similar choice data set is available here. The Displayr output, including histograms of the respondent coefficients, appears below.

 

We'll want to check some of the diagnostics available under Insert > More > Choice Modeling > Diagnostics. Two useful diagnostics - Rhat and effective sample size - are available using Parameter Statistics from that menu, as are Trace Plots. For further discussion of the diagnostics, see this post. The results shown used 200 iterations and eight Markov chains and held-out one choice question for validation.

After checking the diagnostics, we can now make inferences with our model. For example, we see that the respondents strongly prefer rooms with a balcony and also have a slight preference for shorter cruises.

Including covariates in the model

Next, I will re-fit the model including a covariate. In Displayr, covariates are added to the model by dragging variables from the Data Sets tab on the right into the dropbox called "Respondent-specific covariates" or using the "ADVANCED" tab in the object inspector on the right. For demonstration, I fit a model including a categorical variable indicating the respondent's favorite cruise line. This was asked as a separate question before the choice questions. Since the cruise line is included as an attribute in the design that we might expect to be important, we expect this covariate to be important as well. The Displayr output is below.

We see that our prediction accuracy on the held-out task has improved and that the mean root likelihood statistic (RLH) has improved slightly as well. A more complete analysis should be done with more iterations and include all the covariates of interest.

To see how to add covariates to a Hierarchical Bayes MaxDiff analysis, see this blog post.

You can play around with the data yourself using this link. Don't forget to subscribe to the blog for more data science insights!