Webinar

#### How to create experimental designs for conjoint

In this webinar we will show you how to create your own conjoint experimental designs.

#### In this webinar you will learn

Here’s a little summary of some of the subjects we cover in this webinar

You'll learn how to quickly and easily:

• Create designs that work with smaller sample sizes (dual response, efficient designs)
• Maximize the realism of your conjoint studies, so that they predict more accurately (prohibitions, priors, alternative-specific designs, 'none' and 'current' alternatives)
• Deal with large numbers of attributes (partial profile designs)
• Conduct fieldwork for conjoint studies

#### Transcript

I am going to discuss how to create experimental designs for conjoint.

My focus is on the general principles, but I will be showing things in Displayr. Q works exactly the same way.

Overview

Most of today will focus on experimental design. But, towards the end I will discuss four related topics

What is a conjoint experimental design?

What is a conjoint experimental design?

It's the set of instructions that converts a table of attributes and levels, into conjoint questions.

What the design looks like

As with the MaxDiff experimental designs that we looked at a few weeks ago, a conjoint experimental design is typically a whopping big table of numbers, like this one. We are looking at the first 12 of 300 rows here.

I'll shortly show you how to create such tables.

But, let's start by working out how to read them.

Looking at the Brand column, for example. This is for a study with four brands. So, the 1 just means the first brand, the 2 the second, and so on.

Let's replace the numbers with words to make it a bit easier.

Replacing the numbers with words

So, while a design is traditionally all numbers, it can also be written in words.

Creating the questionnaire from the design

As we will soon discuss, often there are multiple versions in a design, with each respondent seeing one of the versions.

The question column refers to the question number. We can see here that question 1 is represented by 3 rows.

The first row shows us the first option that appears in the first question. The second row shows the attribute levels of the second option. The third row the third option. The fourth row the first option in the second question and so on.

Case studies

We will start by looking at a simple toy example with two attributes in the car market. Brand and price. The reason I am using such a simple example is it makes it really easy to see what's going on.

We will then migrate to a more realistic problem which looks at choice of home delivery options.

Creating

Here are the instructions for creating experimental designs in Q and Displayr, but let's go and do it.

Random design

In Displayr:

Insert > More > Choice Modeling > Experimental Design

Attributes and levels > Add data

By default Displayr's creating something called a Balanced overlap design, but we will start with the very simplest of all designs, the random design.

Algorithm: Random

There are a whole lot of diagnostics at the top which we will return to. But, let's srcoll down and look at the design.

Scroll down. Expand Out Design

So, this is our design.

I find it easiest to look at these as a preview of the actual questionnaire.

In Displayr:

Insert > More > Choice Modeling > Preview Choice Questionnaire

Take a good look at Question 1. Is it a good question? No it's not. Best case scenario is that nobody chooses the BMW at \$100K. Worst case is that people get confused and think that the \$100K BMW must be a better car. This is a terrible outcome, as it means that rather than reacting to price, they are instead using it as a cue for quality, and our whole study is stuffed.

There's a second problem as well. If you go through the questions and count up how often different attribute levels appear, you will see that we have Ferrari appearing only 7 times, but General Motors appearing 12 times. How do I know this?

The frequencies are shown here on the left in the design output. Why aren't these balanced, with the same number each time? It's the nature of randomness that we get such imbalance.

How do we fix these problems? We use a more sophisticated algorithm.

Complete enumeration

The most well-known design which prevents attribute levels from reappearing in the same question is the complete enumeration algorithm by sawtooth. Or, if you are from an academic background, orthogonal designs tend to solve this in a similar way. Rather than set up everything from scratch, I'll use Displayr's nifty duplication feature.

As you can see, we've fixed our balance problem. Now each attribute level appears the same number of times. And, if you look at the example questions, at least the same attribute level is not appearing twice. Do we have a good design now? Have a look at the questions. No, it's not a good design.

The second question's just a bit silly. Who will choose the General Motors car at twice the price of the Ferrari? You could just get the Ferrari and resell it to buy a lot of GM cars.

Prohibitions

One fix is to prohibit certain alternatives from appearing. This is called using prohibitions. Now, you might be thinking. Cool. Problem solved. But, it's not so simple. To explain why I need to introduce a new concept.

If we look in the top right we can see that Displayr is giving us a warning. It says we should either change the design or change the sample size to get a big enough sample.

A rule of thumb is that we want all our standard errors to be below 0.05. As you can see, some are above this. Two factors that determine the standard errors are:

1. The design itself
2. The sample size

I've done a bit of trial and error, and can tell you at 355 the error goes away.

Here the sample size is still set to 300. And we have no warning. We can reduce it all the way to 249. Now, let's return to our prohibition.

We have a warning again! We need to increase the sample size from 249 all the way up to 445 just to make up for the decrease in the quality of the design caused by the prohibitions. So, while the prohibitions may seem smart, they are very, very expensive!

Efficient designs with priors

Let's remove the prohibitions.

There's yet another type of design called an efficient design. It's kind of similar in what it's trying to do to Complete enumeration. Just to remind you, we saw before we could have a sample of 249 with the Complete enumeration. We can do almost as well with efficient, with a sample size of 255 being OK. So, why do we care?

Remember last week we looked at the idea of utility. All the experimental designs we have used so far assume that our best guess of the utility of the brands is that they are all equal. And, also, that the price levels are all equally appealing.

This is obviously untrue. The cool thing about efficient designs is we don't need to make such a dumb assumption We can enter a guess, called a prior.

I am going to start by setting the mean utility of General Motors to 0. Why 0? It's a useful convention to set the first levels utility to 0. We saw this in the introductory webinar, and we will see it again next week.

I will set BMW to 1, as I think it is, when you ignore price. more appealing as a brand than general motors. And, I will set Ferrari to 2. Similarly, with price, I am using values of 0, -1, and -3.

Can I use any numbers? No. Best to create them in a scale of -3 to 3. If you want to get technical, they are logit scaled. Lots of people when they first see priors get scared. They say "I don't know enough" I don't want to do this. But, they are thinking about it the wrong way. All the designs we have used so far implicitly assume priors of 0. That's clearly wrong here. Our guesses need to be judged as to whether they are better than assuming 0. They obviously are.

We do have a couple of warnings that we need address. The first is just telling us that in addition to making assumptions about the means, we could also make assumptions about the variance. I return to this later.

The second warning is telling us that we need to have more versions. So, let's set that to 10. We've got our warning about sample sizes back. We need a sample of 600 with these priors.

Now, pay close attention to what I am about to say, as it's really important. The previous sample size calculations all assumed that the truth is that brand makes no difference and people don't care about price. If the mean utilities are like the ones that I've entered in, a much higher sample size would be required for all the earlier designs. Back to the design.

Note how now we are only showing Ferrari 10 times at \$20K, and 56 times at the more realistic \$100K.

So, we've got the same type of thing that we wanted to achieve with prohibitions, but done it in a smarter way whereby it doesn't mean we need a much larger sample size. If we used prohibitions here, we would likely need a sample size of more than 1000. While it’s a better design, it’s not perfect. Really what we should do in this example is something called an Alternative Specific design, where we have a specified range of prices for each.

Alternative specific design

We do that by changing the algorithm to Alternative specific - Federov

We need to enter the attributes and levels in a different way.

We enter how many attributes we want for each alternative.

So, now I have a different price range for every brand.

We get an error message here. With this type of design we can't have 10 versions. With a bit of trial and error we can work out the best number of questions and versions. As you can see, now all our questions are realistic.

Now let's move onto the second case study.

Second case study

This example has 14 attributes. And, one of them, cuisine, has 8 levels. The good news is that with more attributes, things actually get easier.

The dumb questions created by the earlier designs are much more common when you've only got a few attributes!

Balanced overlap

In Displayr:
Insert > More > Choice Model > Experimental Design

We get that error message again say we need more versions. I'd tend to use 100 in a real world study. But, to make things a bit faster I will just use 5 here. Now, this will take a while to compute. If we are in a rush, we can use the shortcut design.

Once more Displayr is going coaching us. It's again giving us the warning about sample size. We could play around to remove that warning, but I’ll leave that for you as homework.

It's telling us that each level should be shown at least 6 times to each individual. So, to fix this we need to increase the number of questions per respondent. Again, this is a trial and error thing.

Cool. Problem solved. But we've got another warning. It's telling us that this design may be too complicated for respondents. Let's have a look at what a respondent will be asked to choose.

In Displayr:

Insert > More > Choice modeling > Preview choice question

Resize top right corner to top right of page

1. So this is the first question. Which would you choose?

Yep. I'm bored too. Nobody's going to read that. Let's look at what Displayr was coaching us to do.

It wants as to create a partial profiles design.

We need to specify how many of the attributes we don't want to show. I've got 14, so let's show 5, which means holding 9 as constant.

OK, this is a much more manageable question. A rough rule of thumb is that we don’t want a grid with more than 20 cells. We've currently got 15. So, we can add more alternatives or more attributes. In terms of sample size, adding more alternatives is desirable.

1. That's better. Better, but we've still got a problem. We've got some more warnings. It says there that some attributes are only shown 0 times in some versions, but an attribute should appear at least 3 times. If you do the math, we are going to need to have at least 40 questions. And, 40 questions will be too many. What can we do? I will return to that question a bit later. But I will just dig deep into more advanced topics for a bit.

Remember our earlier discussion of efficient designs with priors? They are just as relevant here. In this example, we've used priors for means and standard deviations. So, looking at price, for example, we're assuming that on average people have a utility of price of \$30 of -3, but, a standard deviation of 3 to this. One of my colleagues did this. I'd probably have used 1.5 myself, but these things tend not to make a huge difference. The trick is to put a guess in. It's fine for it just to be a guess. But you get the idea. Now, I'm not going to click OK on this, as it will run or a few hours!

Question types

We've now covered the main approaches to creating experimental designs and many of the key decisions that need to be made.

We need to make some other decisions as well. Once we have created a design, we can use any of the question formats I am about to show you.

Choice questions

All the examples and most real-world studies use choice questions like this.

Best-worst questions

A more complex approach is to use best-worst questions, where people are asked which they like most and which they like least.

The great things about these questions is that because they collect more data, they require smaller sample sizes. But, there is a problem as well. The cool thing about choice questions is that in the real world people choose products and in choice questions they choose products.

But, In the real world people don't walk into shops and say which product they won't buy, so this question has what academics call poor ecological validity. That is, it's less likely to collect high quality data, all else being equal.

Ranking questions

We can also ask people to rank alternatives. This collects even more data. But the questions again have poor ecological validity.

Constant-sum questions

And, we can ask people constant sum questions. Again, even more data. But I'd strongly advise against these. In my opinion they violate one of the assumptions of conjoint.

The underlying math of conjoint assumes that the numbers people enter reflect their uncertainty. That is, they are treated as being proportional to probabilities. But they are much more likely to instead indicate variety seeking

And, the questions are tedious to fill in so you get a lot of junk.

Numeric questions

I also think these tend not to be so clever to use. In addition to the variety seeking issue, none of the standard models can be used to analyze the data.

None of these and current

We can also offer people a choice of None of these and current options.

None of these

Note the option on the far right.

Most people who are new to choice-based conjoint instinctively think it is a good idea to add such an option, seeming to view it as a bit like having a Don’t know option in a normal questionnaire. However, including this option comes with considerable costs. In particular:

• When given such an option some people may click this option as an easy alternative to reading the other options. If that happens, then the validity of the entire study is poor.
• You need a larger sample size if using this option. For example, if the None option is chosen half the time, then the required sample size will, else being equal, need to be twice as large.
• Often people do not have none of these as an option in the real world. For example, a family must have electricity and water and foodstuffs, so giving a none of these can sacrifice Ecological validity.

I virtually never ask this style of question.

Current

A similar type of approach is to give users a current alternative.

This has all the problems with none, and also violates some complicated technical assumptions of the hierarchical bayes model

Dual

One solution to the None problem is to ask the dual response question. This solves most of the issues, and allows you to work out at analysis time whether or not the None of these choices are good at predicting real world behavior.

Sample size

We've already look at aspects of sample size with the design

Sample size and experimental

Some types of experimental designs require large sample sizes, such as prohibitions and partial profile designs. If we are using None of these or current options, we need to have bigger samples.

We can reduce sample sizes if we use the less realistic question types. There are a few proprietary techniques out there, such as adaptive choice based conjoint, which are also designed to collect more data per respondent, but they also do this by asking less realistic questions.

Approaches for determining sample size

So how do we work out the sample size?

One rule of thumb is to use the standard errors must be 0.05 or more, as discussed.

I find this useful in understanding the consequences of different design decisions, such as prohibitions.

But, it's just a made-up rule. There are so many impossible assumptions in the formulas that calculate the standard errors, such as the assumption that people have a utility of 0 for everything, which makes them largely meaningless.  This is exactly the same reason that commercial researchers rarely use sample size formulas in more conventional studies.

So, just like with a normal study, we've got hearustics we can use, such as a minimum sample size of 300.

Sawtooth have made nice little formula that I've got here as a calculator. For example, if I halve the number of levels of the attribute with the most levels. The recommended sample size goes down by 50%.

If you want to get all scientific, the way is to use simulations. There's a post about this on the blog post.

Fieldwork/data collection

Now we move onto data collection

Should we create

All the examples I've shown so far have been of boring grid questions.
Why not use prototypes, virtual reality and so on? You can. But most people don't. It makes the studies a lot more expensive. And, this in turn leads to less data, which just makes the studies noisy and expensive. Like everything it's a tradeoff.

Fieldwork tips

Here are some tips for doing fieldwork. I'll give you a few moments to read them

Qualtrics

If you are using Qualtrics, you can automatically program the choice questionnaire if you have API access to Qualtrics.

In Displayr:

Insert > More > Choice Modeling > Export Design to Qualtrics