Unless you've been living under a rock, you will have come across a survey that asked you to pick options between 'most likely' and 'least likely'. These surveys are designed for MaxDiff analysis, a process for finding out preference/importance scores for multiple items. In this post I'll give some background on how MaxDiff designs are created, then I'll show you a new method for making multiple version designs that are pairwise balanced. This method is vast improvement compared to creating multiple version designs by random permutation.

## Creating single version designs

These earlier posts describe how to create MaxDiff experimental designs in Displayr, Q and with R. They also give some guidelines on how to set the numbers of questions and alternatives per question, as well as advice on interpreting designs.

The standard method used to create designs aims to maximize the amount of information that can be extracted from responses. This naturally involves showing each alternative approximately the same number of times and showing each pair of alternatives together approximately the same number of times. Experimental design is a relatively complex topic, but fortunately packaged algorithms do the hard work for us.

As an example, I show below a design with 10 alternatives, 5 alternatives per question and 6 questions.

This design has a single version - each respondent is asked the same questions, selecting the next worst from the same sets of alternatives.

The design can also be described as a binary matrix, where the presence of a 1 indicates which alternatives are shown in each question. The binary matrix for the design above is shown below.

Provided the guidelines for producing a good design are adhered to, this procedure usually makes an excellent single version design. In rare cases of many alternatives, you could find it advantageous to increase the number of repeats.

## Creating multiple version designs

A simple procedure to create another version is to randomly swap the columns of the binary design. Below we can see that the first column of the original version has moved to the seventh column of the new version. Whenever alternative 1 appeared in the original version, we now show alternative 7 in the second version of the design.

You can repeat this process for as many versions are as required. It has the advantage that it preserves the distribution of the frequencies of occurrences of the alternatives within each version. This means that, since each alternative appeared 3 times in the original design, each alternative appears 3 times in all other versions. The same preservation is true of the distribution of pairwise occurrences of alternatives within questions for each version.

However, one drawback is that the across many versions the distribution of pairwise occurrences may become imbalanced. This is a consequence of randomly permuting the alternatives. From the pairwise occurrences across 100 versions shown below we can see that alternative 5 occurs with alternative 6 143 times but alternatives 2 and 5 co-occur only 124 times.

This imbalance does not usually cause any problems if using Hierarchical Bayes to analyse experimental results. However it makes the results of counting analysis more difficult to interpret because alternatives have not been shown with each other the same number of times.

## Creating pairwise balanced designs

An alternative strategy for extending a design to multiple versions is to attempt to maintain pairwise balance. With this method, the design is incremented one version at a time. Many randomly permuted candidate versions are considered. The candidate version that creates an overall design with the least imbalance of pairwise frequencies is chosen.

In Displayr and Q this achieved automatically. In R the balanced.versions argument for the MaxDiffDesign function in the flipMaxDiff package is set to TRUE.

The resulting pairwise frequencies for 100 versions are shown below. Note that the variation is now only between 133 and 134 -  dramatically less than the variation when the design is not pairwise balanced.

## Assessing the difference between randomly permuted and pairwise balanced designs

In order to highlight the differences between randomly permuted and pairwise balanced designs, it is useful to compare some summary statistics. The math isn't complex, but there are a few steps to each calculation, which I explain below.

Mean version balance. The balance of a version is the sum of the absolute differences between the alternative frequencies and their mean. I have illustrated this via the example below. The table shows the counts of each alternative within the version. The mean count is 12, so the sum of the absolute differences from 12 is 2  + 3 + 1 + 2 + 4 = 12. A perfectly balanced version would score a zero.The higher the value, the more imbalanced the design.

To calculate the mean version balance, these balances are averaged across all versions. They resulting mean is standardized so perfect balance has a score of one and the worst possible design (where each question repeats the same set of alternatives) has a score of zero.

Mean version pairwise balance. This is an analogous calculation to the mean version balance, except for the pairwise counts of how many times alternatives are shown together within each version. It is also averaged across versions and scaled so one is perfection and zero is the worst possible.

Across version balance and Across version pairwise balance. You can calculate this the same way as above, the only difference is counting across all version. This provides an assessment of how balanced the total design is, but not about the variation between versions.

## Results

Calculating the above four values for the 100 version random and pairwise balanced designs shows that mean version balance and across version balance are both 1 for both designs. This is unsurprising since the single version design showed each alternative exactly 3 times. Both measures and both methods of creating multiple versions retain the balance.

Mean version pairwise balance is 0.786 for both methods. Although this may appear disappointingly low, there are 60 pairs shown in each version allocated across 45 possible pairs. Since 60 is not divisible by 45 we can never achieve perfect pairwise balance in a single version. In fact 0.786 is the best possible.

Across version pairwise balance reveals the improvement of the pairwise balance method with a score of 0.998 compared to 0.984 for random versions.

## Conclusion

Pairwise balance provides an improved method for producing a multi-version MaxDiff design. The summary statistics introduced above (values on a scale of zero to one) allow easy comparison of designs. Balanced designs can be easier to work with when performing a simple counting analysis of MaxDiff experiments, but are generally unnecessary if using more sophisticated methods such as Hierarchical Bayes.

## Try it yourself

The examples in this post use Displayr as a front-end to running the R code. If you go into our example document, you can see the outputs for yourself. The code that has been used to generate each of the outputs is accessible by selecting the output and clicking Properties > R CODE on the right hand side of the screen. Designs are created with our own flipMaxDiff package, within which D-efficient designs use the AlgDesign package.