Choice experiments, also known as choice-based conjoint (CBC), are widely used for predicting the performance of new products and changes to products' designs and portfolios. This post reviews some of the techniques that can be used to improve the accuracy of a choice experiment's predictions.

So, how accurate are choice experiments?

Clients of choice experiments often request statements regarding the predictive accuracy of the technique. Unfortunately, only crooks provide such assurances. While many thousands of choice experiments have been conducted, the accuracy of only a handful have ever been published. This tells us two things: it is hard to assess their accuracy, and the published record is too slight to be informative.

The difficulty in testing is largely because the things that are tested in choice experiments are rarely the same as those that are implemented in the real world. For example, a road planning choice model may test a proposed tunnel, but the final tunnel is built in a different location for political or economic reasons. A new type of phone may have been tested, but the final model that is launched may have different features. Or in some circumstances, the product may not have been launched or the feature may not have been changed, because the choice model found it would be a bad idea.

There are only a handful of published studies assessing the impact of choice modeling experiments. To make up a number, perhaps less than 0.01% of studies have ever been reported in a way that permits an assessment of their accuracy. Such a record is too slight and selective to be meaningful.

Despite this, while it is not possible to ever state with any real confidence that a model is accurate to some level, there are many things that can be done to improve the predictive accuracy of models. This post describes 12 ways of improving the accuracy of forecasts from a choice modeling study.

1. Simple, easy-to-complete questions

The harder it is for a person to complete a choice question, the more likely they are to only read some of the information, to guess, or to give up entirely. This means that it is in your best interests to make the questions as simple as possible. In practice, there are a number of ways of making experiments easy:

  1. Brief, easy-to-understand descriptions of attributes and product levels.
  2. Minimizing the number of attribute levels that appear in a question. One rule of thumb is that no more than 20 should appear in a single question (e.g., if there are four alternatives, this means a maximum of five attributes per alternative). Typically this is achieved by a mixture of reducing the number of attributes and using partial profile choice experiments (which present people with a subset of alternatives in each question).
  3. Not asking people too many questions.

2. Ecological validity

An experiment has ecological validity when information is presented to a consumer in a way that is comparable to how it is presented in the real world. The more similar, the more likely the study will give valid results, all else being equal. Of course, the great challenge is that all else is rarely equal. For example, detailed information about ingredients of food products is usually written on the back of packets in really small fonts. While it is possible to mimic such a thing in a choice modeling study with the goal of improving ecological validity, doing so goes against the whole idea of asking simple and easy-to-complete questions. Similarly, showing fake supermarket shelves with dozens of products on them should increase ecological validity, but it comes at the expense of making things difficult for the user. There are no straightforward ways of making such tradeoffs.

3. Incentive compatible

A question is incentive compatible if it is written in such a way that it is in the interests of the respondent to tell the truth. One of the great strengths of conjoint is that suffers from less incentive compatibility issues than simpler techniques. For example, if you ask somebody "what's the most you will pay for this product?" they will likely guess that you may use this information to offer them a product with a higher price. By contrast, conjoint forces people to choose between alternatives, making it harder for the respondent to game their responses.

The general advice is to give people a reason to believe that their accurate answers are important (e.g., "We will use the data from this question to improve our product, so the more accurate your answers, the more likely you will get something you like!").

4. Use hierarchical Bayes (HB)

The history of choice modeling is long, and many models have been developed along the way. The one thing that is indisputable about modeling is that hierarchical Bayes (HB) models best explain the variations between people. There are situations where other models outperform hierarchical Bayes, but these are the exceptions rather than the rule, and you should always start with an HB model.

See "How to Use Hierarchical Bayes for Choice Modeling in Displayr" for your step by step guide!

5. Test alternative models

Predictive accuracy can be further improved by testing alternative models, where the main alternatives of interest are latent class models, multi-class HB models, models with interactions, and models with covariates. Comparisons should generally be based on predictive accuracy using data not used in the estimation of the models.

6. Use ensembles

Where there are multiple models that all seem to do relatively well, prediction can be improved by using an average of the predictions of the multiple models. The resulting joint model is known as an ensemble. This tends to improve only the decimal places, so while it is an easy win it is best to only do it when accuracy is key and all the other wins have been taken.

Check out "Comparing Choice Models and Creating Ensembles" for more information!

7. Changing the scale effect and choice rules

Where predictive accuracy is important for forecasting purposes, it is prudent to conduct the study in a way that allows it to make predictions about historic events. This could include predictions about:

  • A person's most recent decisions.
  • Current market share.
  • Market share by geographic area.
  • The effects of historic price changes.
  • Historic changes in products or their attributes (e.g., new product launches, changes in product design).

When such predictions are made, it is normally the case that the model will be off by some amount (sometimes by a very large amount). By adjusting the model so it better predicts the past, we increase the chance of predicting the future with accuracy.

Typically, utilities are estimated in choice models using the inverse logit transformation. For example, if there are three alternatives in a particular question, and we compute the utilities by adding up the utilities for each of the attribute levels, then the probability that option A will be chosen is computed as exp(A) / (exp(A) + exp(B) + exp(C)). We can modify the predictions of a choice model by multiplying each of the utilities by a constant, d: exp(dA) / (exp(dA) + exp(dB) + exp(dC)). Where values of d are greater than 1, this has the effect of increasing the probability that a person is predicted as choosing the alternative with the highest utility, and values of less than 1 reduce the probability. This is not as dodgy as it may first appear. We should expect that there is some level of noise when people answer questions. We should also expect some level of noise in real world decision-making (e.g., it is hard to make the best choice when you are in a rush and the kids are behaving badly). There is no reason to expect the levels of noise to be consistent between the data collection and the real world, and adjusting the value of d, which is called the scale effect, can correct for this.

Consequently, where we have historic data, we can improve our predictions by modifying the scale effect. There are quite a few different ways of doing this, involving combinations of the following:

  • Working out the best value of for the entire study, by respondent, or by segment.
  • Use different choice rules. For example, assume that people will always choose the alternative with the highest utility (which is equivalent to setting at close to infinity), or estimating one value of for the three most preferred alternatives, and setting it to for the less preferred alternatives.
  • Using respondent-level utilities, utilities from the draws for each respondent, or utilities from the hierarchical distributions.

8. Calibrating utilities

Sometimes, systematic differences will be identified that cannot be addressed via scale effects and choice rules. For example, a particular brand my have substantially lower predicted share from the choice model than in reality. This can be addressed by adjusting the utilities for this brand until the model's predictions are consistent with historic data. This can be done by adding a constant to all the respondents utilities, or by performing adjustments at the respondent or segment levels.

9. Adding switching costs into models

It is often the case that choice models predict much higher levels of switching than occur in the real world. This is particularly the case in "set and forget" markets like cable TV and electricity. One way of addressing this is to assume that people choose in two stages: stage 1 is to decide whether to switch, and stage 2 is to choose their alternative. There are a number of ways of building such an assumption into a choice model's predictions:

  1. Assume that the stage 1 model is a fixed probability (e.g., people have a 10% chance of switching), where the fixed probability is estimated by predicting historic switching rates.
  2. Using machine learning or statistical techniques to estimate the probability based on historic data (i.e., identifying the people most likely to switch based on historic behavior)
  3. Making people more likely to switch when the appeal of the alternatives increases. A simple way of doing this is to make the switching probability be 1/(1 + exp(- s - logsum)), is a constant to be estimated using historic data, and logsum is the log of the bottom-half of the inverse logit (i.e., if there are three alternatives with utilities of A, B, and C, respectively, then their logsum is log(exp(A) + exp(B) + exp(C))).

10. Distribution and awareness multipliers

A simple adjustment that can be performed when predicting share is to add in an assumed constant indicating the degree of distribution and/or awareness of each brand. Where is the distribution or awareness for the first alternative, for the second, and for the third: a exp(A) / (a exp(A) + b exp(B) + c exp(C)).

11. Respondent-specific scenarios

Often the alternatives that are available to people differ in the real world. Thinking of transport, for example, where you live determines the attributes of the different transport options (e.g., waiting times for buses, relative crowding, ticket prices). Similarly, with consumer products, whether you shop at a supermarket or a at a local grocery store impacts on what is available to you and how much you will pay.  Predictions can be improved by making adjustments for such things at the respondent level. Similarly, distribution and awareness can be adjusted at the respondent level.

12. Mr P (multilevel regression and poststratification)

Where utilities are related to demographic and geographic variables, predictions can be improved using a technique referred to as "Mr P", which involves building models predicting the utilities and then simulating using using census data.

Want more tips on choice modeling? Learn how to create an online choice simulator!