Creating the experimental design for a MaxDiff experiment is easy in R. This post describes how to create and check a MaxDiff experimental design. If you are not sure what this is, it would be best to read A beginner’s guide to MaxDiff first.
Step 1: Installing the packages
The first step is to install the flipMaxDiff package and a series of dependent packages. Depending on how your R has been setup, you may need to install none of these (e.g., if using Displayr), or even more packages than are shown below.
install.packages("devtools") library(devtools) install.packages("AlgDesign") install_github("Displayr/flipData") install_github("Displayr/flipTransformations") install.packages("Rcpp") install_github("Displayr/flipMaxDiff")
Step 2: Creating the design
The MaxDiffDesign function is a wrapper for the optBlock function in the wonderful AlgDesign package. The following snippet can be used to create a design. The arguments used in the code snippet here are described immediately below.
library(flipMaxDiff) MaxDiffDesign(number.alternatives = 10, alternatives.per.question = 5, number.questions = 6, n.repeats = 1)
- number.alternatives: The total number of alternatives considered in the study. In my technology study, for example, I had 10 brands, so I enter the number of alternatives as 10.
- alternatives.per.question: The number of alternatives shown to the respondents in each individual task. I tend to set this at 5. Where I have studies where the alternatives are wordy, I like to reduce this to 4. Where the alternatives are really easy to understand, I have used 6. The key trade-off here is cognitive difficulty for the respondent. The harder the questions, the more likely people are to not consider them very carefully.
- number.questions: The number of questions (i.e., tasks or sets) to present to respondents. A rule of thumb provided by the good folks at Sawtooth Software states the ideal number of questions: 3 * number.alternatives/alternatives.per.question. This would suggest that in the technology study, I should have used 3 * 10 / 5 = 6 questions, which is indeed the number that I used in the study. There are two conflicting factors to trade off when setting the number of questions. The more questions, the more respondent fatigue, and the worse your data becomes. The fewer questions, the less data, and the harder it is to work out the relative appeal of alternatives that have a similar level of overall appeal. I return to this topic in the discussion of checking designs, below.
- n.versions: The number of versions to use. Where the focus is only on comparing the alternatives (e.g., identifying the best from a series of product concepts), it is a good idea to create multiple versions of the design so as to reduce the effect of order and context effects. Sawtooth Software suggest that if having multiple versions, 10 is sufficient to minimize order and context effects, although there is no good reason not to have a separate design for each respondent. Where the goal of the study is to compare different people, such as when performing segmentation studies, it is often appropriate to use a single version (as if you you have multiple designs, this is a source of variation between respondents, and may influence the segmentation).
- n.repeats: The algorithm includes a randomization component. Occasionally, this can lead to poor designs being found (how to check for this is described below). Sometimes this problem can be remedied by increasing n.repeats.
Step 3: Interpreting the design
The design is called the binary.design. Each row represents a question. Each column shows which alternatives are to be shown. Thus, in the first question, the respondent evaluates alternatives 1, 3, 5, 6, and 10. More complicated designs can have additional information (this is discussed below)
I tend to add one additional complication to my MaxDiff studies. I get the data collection to involve randomization of the order of the alternatives between respondents. One and only one respondent had brands shown in this order: Apple, Google Samsung, Sony, Microsoft, Intel, Dell, Nokia, IBM, and Yahoo. So, whenever Apple appeared it was at the top, whenever Google appeared, it was below Apple if Apple appeared, but at the top otherwise, etc. The next respondent had the brands in a different order, and so on.
If doing randomization like this, I strongly advise having this randomization done in the data collection software. You can then undo it when creating the data file, enabling you to conduct the analysis as if no randomization ever occurred.
There are many other ways of complicating designs, such as to deal with large numbers of alternatives, and to prevent certain pairs of alternatives appearing together. Click here for more information about this.
Step 4: Checking the design
In an ideal world, a MaxDiff experimental design has the following characteristics, where each alternative appears:
- At least 3 times.
- The same number of times.
- With each other alternative the same number of times (e.g., each alternative appears with each other alternative twice).
Due to a combination of maths and a desire to avoid respondent fatigue, few MaxDiff experimental designs satisfy these three requirements (the last one is particularly tough).
Above, I described a design with 10 alternatives, 5 alternatives per question, and 6 questions. Below, I show the outputs where I have changed of alternatives per question from 5 to 4. This small change has made a good design awful. How can we see it is awful? The first thing to note is that 6 warnings are shown at the bottom.
The first warning is telling us that we have ignored the advice about how to compute the number of questions, and we should instead have at least 8 questions. (Or, more alternatives per question.)
The second warning is telling us that we have an alternative that only appears two times, whereas good practice is that we should have each alternative appearing three times.
The third alternative tells us that some alternatives appear more regularly than others. Looking at the frequencies output, we can see that options appeared either 2 or 3 times. Why does this matter? It means we have collected more information about some of the alternatives than others, so may end up with different levels of precision of our estimates of the appeal of different alternatives.
The fourth warning is a bit cryptic. To understand it we need to look at the binary correlations, which are reproduced below. This correlation matrix shows the correlations between each of the columns of the experimental design (i.e., binary.design shown above). Looking at row 4 and column 8 we see a big problem. Alternative 4 and 8 are perfectly negatively correlated. That is, whenever alternative 4 appears in the design alternative 8 does not appear, and whenever 8 appears, 4 does not appear. One of the cool things about MaxDiff is that it can sometimes still work even with such a flaw in the experimental design. It would, however, be foolhardy to rely on this. The basic purpose of MaxDiff is to work out relative preferences between alternatives, and its ability to do this is clearly compromised if some alternatives are never shown with others.
The 5th warning tells us that there is a large range in the correlations. In most experimental designs, the ideal design results in a correlation of 0 between all the variables. MaxDiff designs differ from this, as, on average, there will always be a negative correlation between the variables. However, the basic idea is the same: we strive for designs where the correlations are close to 0 as possible. Correlations in the range of -0.5 and 0.5 should, in my opinion, cause no concern.
The last warning tells us that some alternatives never appear together. We already deduced this from the binary correlations.
The first thing to do when you have a poor design is to increase the setting for n.repeats. Start by setting it to 10. Then, if you have patience, try 100, and then bigger numbers. This only occasionally works. But, when it does work it is a good outcome. If this does not work, you need to change something else. Reducing the number of alternatives and/or increasing the number of questions are usually the best places to start.
Checking designs with multiple versions
When you set the number of versions to more than 1, this will not change any of the warnings described in the previous section. All of these warnings relate to the quality of the design for an individual person. Increasing the number of versions improves the design for estimating results for the total sample, but this does not mean the designs change in any way for individual respondents. Thus, if you are doing any analyses at the respondent level, changing the number of versions does not help in any way.
Additional detailed outputs are provided when using multiple versions, which show the properties of the design as a whole. These show the binary correlations across the entire design, and the pairwise frequencies. There interpretation is as described above, except that it relates to the entire design, rather than to the design of each version.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.