| 03 August 2017 |
MaxDiff analysis is one of those advanced techniques that can be run by any good quant researcher. Yes, the modern algorithms have their fair share of rocket science in them, but with a bit of experience they can be used without a good understanding of how the rocket science works. In this post I am sharing a 11 tips to help people that are relatively new to the field.
If you are a MaxDiff analysis novice, please check out A Beginner’s Guide to MaxDiff analysis before reading this post.
1. Keep it simple (particularly if it is your first MaxDiff analysis)
MaxDiff projects come in all shapes and sizes. They vary on the following dimensions:
- Sample size. I have seen them from 40 respondents all the way up to more than 6,000.
- The number of separate MaxDiff experiments in the study. Typically people have only a single experiment, but I have seen as many as five in the one project (one for each market segment, with different attributes in each).
- How many alternatives there are (e.g., 10 in a small study, 100 in a huge study).
- The number of versions of the experimental design, from 1 through to 100’s.
- How many times each alternative is shown to each person. At least 3 is good practice. With less than this number you need to rely heavily on statistical wizardry.
- The number of separate analysis objectives (e.g., segmentation, cross-category segmentation, profiling).
- The speed with which the analysis needs to be done, from hours through to months.
If it is your first study, make sure you “go safe” (simple) on each of these dimensions. That is, if possible, have a small sample size, a single experiment, a small number of alternatives, only one version, show each alternative to each person 3 times, have a single and clear clear analysis objective (e.g., segmentation), and set aside a lot time (think weeks if it is your very first study).
If you find yourself in a situation where you need to “go large” on your first project, there are a couple of simple hacks. The obvious one is to outsource. But, the alternatives are to have a go at recreating and understanding old studies you have outsourced or working through case studies, such as the MaxDiff analysis examples here.
2. Avoid having multiple versions unless you really, really, need them
Having multiple versions of an experimental design causes two problems. First, it is not ideal for segmentation, as the point of segmentation is to group together people that are similar, and having a design with multiple versions reduces your ability to work out if people are similar. Second, it is just an additional complication that can cause errors in fieldwork or analysis. So, unless you have a good reason to have multiple versions, such as too many alternatives for a single version, it is best not to have multiple versions in my opinion.
3. Estimate your models before you are out of field
Do you wear a seat belt when you drive? The MaxDiff (and choice modelling) equivalent of a seat belt is to run all your models based on the initial interviews, ideally halting interviewing until your initial analysis is complete. I have done this on every single consulting project I have ever performed. If it is your first MaxDiff project, and you don’t do this, you are choosing to learn to drive without a seat belt.
Estimating your models before having completed fieldwork achieves three goals, it:
- Allows identification of problems in the experimental design prior to it being too late to do anything to fix them. See also How to Check an Experimental Design (MaxDiff, Choice Modeling).
- Leads to fast identification of fieldwork problems when there is time to fix them.
- Provides forewarning of likely analysis problems. Some of the more common analysis problems are described below. All of them take time to fix. If you discover them while in field, they can usually be solved before it becomes a client-facing car crash.
4. Start with latent class analysis
There are five basic ways to analyze MaxDiff experiments:
- Counting analysis (aka counts analysis).
- Latent class analysis.
- Standard Hierarchical Bayes.
- Mixtures of Normals with a Full Covariance Matrix, either estimated using Maximum Simulated Likelihood (in Q) or Hierarchical Bayes (via the R package ‘bayesm’).
- More complex models, such as varying coefficients, constrained mixtures of normals, non-normal distributions, generalized mixed logit, etc.
The first of these, counting analysis, is invalid. You are better off using a traditional rating scale than using counting analysis with MaxDiff.
Latent class analysis is the safest of all of these methods.
Why is latent class analysis the safest?
Latent class analysis has a few big advantages over the more complex methods:
- It is easy to interpret. All the other advanced methods require either technical expertise, or, post-processing of respondent-level data, in order to interpret them correctly. With latent class analysis, you just look at the segment sizes and descriptions, which makes it simple to understand. Simple to understand means you quickly find problems and/or insights.
- It is the best default method for segmentation. Latent class analysis creates segments. This is precisely what it is designed to do. Yes, there are other methods that can also create segments (see Tip 11). However, they are two-step methods (first compute respondent-level results, then cluster them), and errors are introduced in each step. On the other hand, latent class analysis involves only a single step and thus, all else being equal, involves less error.
- It is a safe model. The standard Hierarchical Bayes model, which is available in Sawtooth, is usually, in a statistical sense, a bit better than latent class analysis. But, it can sometimes be much worse (particularly for segmentation). In particular, if there are a small number of discrete segments, latent class analysis will likely find them but the standard Hierarchical Bayes model will likely not.
See How MaxDiff Analysis Works (Simplish, but Not for Dummies) for an intro into MaxDiff analysis.
5. Increase the number of segments if you get ‘Other’ or correlated segments
Latent class analysis can lead to uninteresting segments. There are two common flavors of uninteresting segments that arise in latent class analysis:
- Correlated segments. Segments that have the same top alternative(s), differing only in relativities among the less preferred alternatives. If you are looking at preference shares (aka probability %), you can even have segments that appear to be identical, because the differences all relate to preferences for the Worst option, but they are all rounded to 0% so cannot be seen.
- An ‘Other’ segment, where everything is somewhat important and few alternatives are unimportant.
The 6-segment latent class solution below illustrates both of these types of poor segments. This segmentation looks at preferences for tech companies. Segment 1 and 3 are both illustrating strong preferences for Samsung. Segments 2, 4, and 6 strongly prefer Apple. Two of the segments (5, and to an extent 4) have highly mixed preferences.
Compare segments 2 and 6. People in these segments have a very strong preference for Apple. As Apple is the most preferred brand in the study, it makes sense that we would have Apple devotees split into multiple segments, in terms of their secondary preferences. In segment 2, the hardware brands Samsung and Sony are the second and third most preferred brands. In segment 6, by contrast, the secondary preferences are for the software brands, Google and Microsoft. These segments make sense. It is just that from a managerial perspective they are perhaps not very interesting, and latent class analysis is only focused on the statistical properties of the data rather than the managerial significance.
Another cause of correlated segments is where people have the same basic preferences, but they differ in the amount of noise (i.e., inconsistencies) in their data.
Yet another cause of correlated segments is when a dominant attribute has been included. For example, if you have an attribute like “taste” in a study of food or drinks, or “quality” in a technology market, there is a good chance it will be important in all of your segments.
In the case of the ‘Other’ segment, there are two reasons why it can occur. It can occur because you force together lots of very small segments. Or, because there is a segment of people that answer questions in a highly inconsistent fashion.
A solution to this problem is often just to increase the number of segments, rather than using judgement to merge together similar segments. This does not always solve the problem. Tip 11 provides a different solution which usually does the job.
6. Switch to a continuous mixture model if you get ‘Other’ or correlated segments
Latent class analysis assumes that there are a number of segments in the data. This assumption can be wrong. When wrong, it can manifest itself via uninteresting segments (see Tip 5). Most of the more complicated models instead assume that people vary on continua. The simplest of the models is the standard Standard Hierarchical Bayes, but see also the third, fourth, and fifth models described in Tip 4.
7. Compare multiple models
While you should start with latent class analysis, and it will often do the job if your focus is segmentation, it is usually a good idea to compare different models. This is doubly true if you have an interesting model. Three basic ways of choosing between models are:
- Statistical comparisons. The best approach is usually to compare based on cross-validation, but most latent class analyses do not support this, so you can use the BIC instead.
- Based on the extent to which their respondent-level preference shares are correlated with demographics. I discuss respondent-level preference shares in Tip 11.
- Strategic usefulness. If you have two different models, and there is no way to separate them on statistical grounds, it is reasonable to choose the model which gives the most interesting strategic conclusions.
8. Use a powerful computer
These days most computers are OK for most problems. Large MaxDiff studies, however, are not a normal problem. Some big analyses can take days to run a single model. A faster computer – more memory, fast CPU – can do things in one-tenth the time as a slow old clunky one. If you have a large sample size (e.g., more than 2,000), a large number of versions, or a large number of alternatives, you will really appreciate using a powerful computer. If you fail to heed this advice, your best case is that you spend a lot of time waiting. The worst case is that your computer crashes and you cannot compute get any results without getting a faster computer.
9. Run multiple models at the same time
Usually there is a need to run multiple models. At the least, comparing latent class analysis to a continuous mixture model. If the models are really slow, the simple solution is to run multiple models at the same time. If you are using a cloud-based app, like Displayr, you can get it to do it for you. But the simple hack, if you are using a desktop program like Sawtooth or Q, is to open the program multiple times, perhaps even on multiple computers.
10. Choose fast algorithms for big problems
There are lots of different MaxDiff algorithms. Some are faster than others. This makes a big difference if you have a big MaxDiff analysis. To the best of my knowledge, the fastest safe approaches are, in order:
- Latent class analysis with a small number of segments (e.g., 5 or less).
- Hierarchical Bayes.
- Normal mixing distribution with a full covariance matrix estimated via EM algorithm. You can implement this exotic algorithm in Q by setting up the MaxDiff analysis as a Ranking question, and using Create > Latent Class Analysis > Advanced, setting the distribution to Multivariate Normal – Full Covariance, and unchecking the Pooling option, estimating only a single class/segment. This model is theoretically very similar to Hierarchical Bayes, but in practice seems a bit worse than Sawtooth’s Hierarchical Bayes.
- A mixture of normal mixing distributions. This is a combination of the normal mixing distribution and latent class analysis (i.e., it is latent class analysis, where each class has its own multivariate normal mixing distribution).
11. Use cluster analysis if latent class analysis does not give you good segments
If you are wanting to do segmentation, latent class analysis should always be your first port of call, as described in Tip 4. However, if you have time, or, you get uninteresting segments (see Tip 5), a good alternative is to:
- Estimate a continuous mixture model (see Tip 4).
- Estimate respondent-level preference share estimates. See the section Respondent-level preference shares in How MaxDiff Analysis Works (Simplish, but Not for Dummies) for more information about computing respondent-level preference shares.
- Form segments using cluster analysis.
- Use the norm tricks to make the cluster analysis good. For example, if you find segments dominated by preference for a particular brand, you can leave out the variable that relates to this brand, merge together segments, scale the variables, conduct cluster analysis within a segment, etc.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.