# Comparing Choice Models and Creating Ensembles in Displayr

There are a variety of different models available in Displayr to perform Choice Modeling. In fact, Displayr is the best choice modeling software in world. And did we mention it is free? You can read about "How to do Choice Modeling in Displayr" here!

In this post we firstly describe how to easily compare the models. Secondly, we demonstrate how to create an ensemble which combines the models and potentially improves prediction accuracy.

## Types of Choice model

There are two main categories of choice model: hierarchical Bayes and latent class. Within these categories, models are further specified by other parameters such as the number of classes. We frequently want to experiment with a variety of different models in order to find the most accurate.

To illustrate the comparison, we are going use 1, 2 and 3 class hierarchical Bayes models. This post describes how to set up a choice model in Displayr. I'll also be using the eggs data described in that post. For each model we leave out 2 questions during the fitting process. The prediction accuracy for the 2 questions provides and unbiased estimate of accuracy (compared to the accuracy from the questions used for fitting). The output for the 1 class model is shown below.

## Comparing models

To create a table comparing several models, navigate to **Insert > More > Conjoint/Choice Modeling > Ensemble of Models**. Then drag models into the *Input models* box, or select them from the drop-down list.

If the *Ensemble* box is not checked, the resulting table just compares the models. When this is the case, it is not necessary that the models use the same underlying data. If the *Ensemble* box is checked then an additional model is created, which requires that the underlying models all use the same data.

The table for my 3 models is as follows. The best values for each measure are shaded in dark blue and the worst are shaded in light blue.

We can see that the 1 class model performs the best of the underlying models in terms of accuracy on the holdout questions. It is also the fastest to run and has superior BIC and log-likelihood metrics (which are measures of goodness of fit).

## How models are combined in an ensemble

To create an ensemble, we use the respondent utilities (also known as coefficients or parameters). I provide a brief overview here but this post describes the relationship between utilities and preference shares in more detail.

- Utilities are a measure of how much each respondent prefers each level of each attribute.
- The models fit (i.e. estimates) these utilities from the responses.
- The preference of a respondent for an alternative is calculated as
*e*raised to the power of the sum of the utilities of each level of the alternative. - The probability that the respondent will chose a specific alternative is given by the the ratio of the preference for that alternative to the sum of preferences of all possible alternatives.

The table below shows the utilities for the *Weight*, *Organic* and *Charity *attributes for the first 10 respondents. As we might expect, greater weights tend to have greater utilities.

The ensemble is created by averaging utility tables across the models.

## Why ensembles can improve accuracy

We can see from the previous table that the ensemble has a superior out-of-sample prediction accuracy to each of the 3 underlying models. Since the ensemble is created by averaging, it may be surprising that the ensemble accuracy isn't just the average accuracy.

Tho understand this effect, imagine if you know nothing about tennis (maybe you don't need to imagine!) and asked one person "Who is the best male tennis player in the world?". They reply "Roger Federer". Depending on how much you think that person knows, you will trust their answer to a certain degree. Now you ask the same question to another 99 people. If their answers all agree or there is a significant majority, you will be more confident that Roger really is the best. If you get a mixture of responses including Rafael Nadal and Novak Djokovic then you would not be so sure who will win the next grand slam tournament.

Ensembles work in a similar manner. Each model makes predictions and some models will be better than others at predicting in a specific situation. By taking the average utilities we reduce the noise from individual models (noise is technically known as *variance* in this situation).

It's also important to consider model correlation. If the models are very similar then the benefit from averaging will be small. In the extreme case of identical models, each additional model brings nothing new and there is no increase in accuracy. If the models are diverse and each is a good predictor in different situations, then the increase in accuracy is large. In practice the models are similar, so the benefit is small but potentially tangible enough that the winners of prediction competitions almost always use ensembles.

## Ensemble parameter histograms

By setting *Output* to *Ensemble* we can visualize the respondent utility distributions in the same manner as for the underlying models. This is shown below.

We can also use **Insert > More > Conjoint/Choice Modeling > Save Variable(s)** to add the coefficients or proportion of correct predictions to the data set.

**Read more about **market research**, or **try this analysis** yourself! The **flipChoice** R package, which uses the **rstan** package, creates the hierarchical Bayes models and ensemble.**