Using Cross-Validation to Measure MaxDiff Performance
This post compares various approaches to analyzing MaxDiff data using a method known as cross-validation. Before you read this post, make sure you first read How MaxDiff analysis works, which describes many of the approaches mentioned in this post.
Download our free MaxDiff ebook
Cross-validation refers to the general practice of fitting a statistical model to part of a data set (in-sample data), and then evaluating the performance of the model using the remaining data (out-of-sample data). This is done by using the model to make predictions with the out-of-sample predictors and comparing the predictions to the out-of-sample outcomes. Cross-validation is popular because it can be uniformly applied to almost any predictive model without needing to deal with the theoretical details of the model. For this comparison, I will partition the data by randomly leaving out 1 of the 6 questions per respondent (out-of-sample), so that the remaining 5 questions will be part of the in-sample data. Performance will be measured by the proportion of respondents for which the best alternative was correctly predicted in the left-out question, using the individual-level parameters estimated using the in-sample data to make predictions.
Latent class analysis
The first model that I shall look at is latent class analysis, or more specifically latent class rank-ordered logit with ties. The table below shows the results for a 3-class latent class analysis on the technology data set seen in previous blog posts, in which respondents are asked to choose between technology companies. The in-sample prediction accuracy in this case is 60.8%, which is the percentage of questions in which the best alternative was correctly predicted. Note that the predictions are being made on the same data that was used to fit the model, which means that the prediction accuracy will be over-optimistic compared to out-of-sample prediction accuracies.
To test this statement, I run the same model but with one question per respondent randomly left out. The out-of-sample prediction accuracy is 56.3%, which as expected is smaller than the in-sample prediction accuracy seen previously. This value is the percentage of respondents in which the best alternative was correctly predicted in the question that was left out.
Boosted varying coefficients
The next model is known as boosted varying coefficients, which is a model that we have developed to make use of covariate variables that are often present in the data set. This model works by first running latent class analyses over the levels in the covariates, instead of over respondents, i.e., it assigns class membership probabilities to levels of covariates, instead of to respondents. The number of classes to use is chosen by iterating through all possible class sizes (up to the number of levels), and selecting the model with the best Bayesian information criterion (BIC). From the best model, individual-level parameters are computed for each respondent. If the one-class model turns out to be the best in terms of BIC, no model is selected. This process is repeated for each covariate, where subsequent models are boosted by the individual-level parameters from the previous model. By boosting, I mean that individual-level parameters are added to the usual model parameters when running the model. The resulting individual-level parameters from this process are finally used to boost an individual-level latent class analysis (over respondents).
The table below shows the results from a boosted varying coefficients model where the covariates are the likelihood of recommending certain companies (Apple, Microsoft, Google and Samsung). From the subtitle, we can see that 3 classes were chosen for Apple, Microsoft was excluded and 2 classes were chosen for Google and Samsung. A 3-class latent class analysis is run at the end, which is the same number of classes that was used in the previous section. The in-sample prediction accuracy is 62.3%, which is higher than that from latent class analysis by itself. However, this is somewhat expected, since the boosted varying coefficients model has a larger number of parameters. To account for this, we compare the BIC, which is lower: 8696 versus 8873 in the latent class analysis. Based on the BIC, boosted varying coefficients is a better model.
The table below shows the results from a boosted varying coefficients with one question left out. As before, the prediction accuracy drops to 58.3%, but this is still better than the 56.3% seen with the latent class analysis. This further confirms that boosted varying coefficients is superior to latent class analysis.
Increasing the number of classes
The choice of 3 classes was arbitrary. With cross-validation, I can determine if another number of classes results in a better model. The table below shows the results of a model which is the same as before but with 3 classes replaced with 5. Prediction accuracy has gone up and BIC has gone down, from which I conclude that 5 classes is better than 3. This process can be repeated until the optimum number of classes is found.
Increasing the number of questions left out
So far we have only shown results where one question has been left out. If more questions are left out, we reduce the amount of data available to fit the model, which negatively affects performance, as shown in the table below. However, with 2 questions left out, I expect the variability in the prediction accuracy to go down, as we are measuring performance over twice as many data points (although number of questions per respondent in the in-sample data set shrinks from 5 to 4). There is no exact answer to how many questions to leave out, but I would recommend not leaving out half or more of the questions so that there is sufficient in-sample data to fit the model.
We have seen that cross-validation is a simple but powerful method of comparing MaxDiff model performance, with the results matching those given by BIC (i.e. when prediction accuracy increases, BIC goes down). The results above indicate that when everything else is kept the same, the boosted varying coefficients model outperforms latent class analysis.
If you click here, you can login to Displayr and see all the analyses that were used in this post.