This post describes how to create and check a MaxDiff experimental design. If you are not sure what this is, best to read An introduction to MaxDiff first.
Creating the design
- In Q, select Create > Marketing > MaxDiff > Experimental Design.
- Specify the number of Alternatives in the study. In my technology study, for example, I had 10 brands, so I enter the number of alternatives as 10. The alternatives can be labeled if you wish, or shown as numbers.
- Specify the number of Alternatives per question. I tend to set this at 5. Where I have studies where the alternatives are wordy, I like to instead use only 4 alternatives per question. Where the alternatives are really easy to understand I have used 6. The key trade-off here is cognitive difficulty for the respondent. The harder the questions, the more likely people are to not consider them very carefully.
- Specify the number of Questions to ask. A rule of thumb provided by the good folk at Sawtooth Software states the ideal number of questions: 3 * Alternatives / Alternatives per question. This would suggest that in the technology study, I should have used 3 * 10 / 5 = 6 questions, which is indeed the number that I used in the study. There are two conflicting factors to trade off when setting the number of questions. The more questions, the more respondent fatigue, and the worse your data becomes. The fewer questions, the less data, and the harder it is to work out the relative appeal of alternatives that have a similar level of overall appeal. I return to this topic in the discussion of checking designs, below.
- Specify the number of Versions to ask. Where the focus is only on comparing the alternatives (e.g., identifying the best from a series of product concepts), it is a good idea to create multiple versions of the design so as to reduce the effect of order and context effects. Sawtooth Software suggest that if having multiple versions, 10 is sufficient to minimize order and context effects, although there is no good reason not to have a separate design for each respondent. Where the goal of the study is to compare different people, such as when performing segmentation studies, it is often appropriate to use a single version (as if you you have multiple designs, this is a source of variation between respondents, and may influence the segmentation).
- Q's algorithm includes a randomization component. Occasionally, this can lead to poor designs being found (how to check for this is described below). Sometimes this problem can be remedied by increasing the number of Repeats.
- The alternatives for each question can be shown in the numeric order (or the order of the labels), or a random order.
Interpreting the design
The experimental design is typically shown as a table. In the example below, each row represents a question. Each column shows which alternatives appear as options in each of the questions. Thus, in the first question, the respondent evaluates Alternatives 1, 3, 5, 6, and 10. More complicated designs can have additional information (this is discussed below).
More complicated designs
I tend to add one additional complication to my MaxDiff studies. I get the data collection to involve randomization of the order of the alternatives between respondents. One and only one respondent had brands shown in this order: Apple, Google Samsung, Sony, Microsoft, Intel, Dell, Nokia, IBM, and Yahoo. So, whenever Apple appeared it was at the top, whenever Google appeared, it was below Apple if Apple appeared, but at the top otherwise, etc. The next respondent had the brands in a different order, and so on.
If doing randomization like this, I strongly advise having this randomization done in the data collection software. You can then undo it when creating the data file, enabling you to conduct the analysis as if no randomization ever occurred.
There are many other ways of complicating designs, such as to deal with large numbers of alternatives, and to prevent certain pairs of alternatives appearing together. Click here for more information about this.
Checking the design
In an ideal world, a MaxDiff experimental design has the following characteristics, where each alternative appears:
- At least 3 times.
- The same number of times.
- With each other alternative the same number of times (e.g., each alternative appears with each other alternative twice).
Due to a combination of maths and a desire to avoid respondent fatigue, few MaxDiff experimental designs satisfy these three requirements (the last one is particularly tough).
Above, I described a design with 10 alternatives, 5 alternatives per question, and 6 questions. If you select the Detailed outputs option, you are shown a series of outputs that allows you to verify that the design showed earlier meets the first two of these requirements, and does an OK job on the last one (each attribute appears either once or twice with each other attribute).
The screen shot below shows Q where I have reduced the number of alternatives per question from 5 to 4. This small change has made a good design awful. How can we see it is awful? The first thing to note is that 6 warnings are shown at the top of the screen (if you do not see 6, click the More button to the bottom-right of the warnings).
The first warning is telling us that we have ignored the advice about how to compute the number of questions, and we should instead have at least 8 questions. (Or, more alternatives per question.)
The second warning is telling us that we have an alternative that only appears two times, whereas good practice is that we should have each alternative appearing three times.
The third warning tells us that some alternatives appear more regularly than others. Looking at the frequencies output, we can see that options appeared either 2 or 3 times. Why does this matter? It means we have collected more information about some of the alternatives than others, so may end up with different levels of precision of our estimates of the appeal of different alternatives.
The fourth warning is a bit cryptic. To understand it we need to look at the binary correlations, which are shown below. This correlation matrix shows the correlations between each of the columns of the experimental design (i.e., binary.design shown above). Looking at row 4 and column 8 we see a big problem. Alternative 4 and 8 are perfectly negatively correlated. That is, whenever alternative 4 appears in the design alternative 8 does not appear, and whenever 8 appears, 4 does not appear. One of the cool things about MaxDiff is that it can sometimes still work even with such a flaw in the experimental design; however, it would be a foolhardy person that would rely on this, because the basic purpose of MaxDiff is to work out relative preferences between alternatives, and its ability to do this is clearly compromised if some alternatives are never shown with others.
The 5th warning tells us that there is a large range in the correlations. In most experimental designs, the ideal design results in a correlation of 0 between all the variables. MaxDiff designs differ from this, as, on average, there will always be a negative correlation between the variables. However, the basic idea is the same: we strive for designs where the correlations are close to 0 as possible. Correlations in the range of -0.5 and 0.5 should, in my opinion, cause no concern.
The last warning tells us that some alternatives never appear together. We already deduced this from the binary correlations.
Checking designs with multiple versions
When you set the number of versions to more than 1, this will not change any of the warnings described in the previous section. All of these warnings relate to the quality of the design for an individual person. Increasing the number of versions improves the design for estimating results for the total sample, but this does not mean the designs change in any way for individual respondents. Thus, if you are doing any analyses at the respondent level, changing the number of versions does not help in any way.
Additional detailed outputs are provided when using multiple versions, which show the properties of the design as a whole. These show the binary correlations across the entire design, and the pairwise frequencies. There interpretation is as described above, except that it relates to the entire design, rather than to the design of each version.
How to fix a poor design
The first thing to do when you have a poor design is to increase the setting for Repeats. Start by setting it to 10. Then, if you have patience, try 100, and then bigger numbers. This only occasionally works. But, when it does work it is a good outcome.
If Repeats does not work, you need to change something else. Reducing the number of alternatives and/or increasing the number of questions are usually the best places to start.