How to Work Out the Number of Classes in Latent Class Analysis
Latent class analysis, which is also known as finite mixture modeling, requires the analyst to specify the number of classes prior to the application of the technique. There are seven approaches to choosing the number of classes: cross-validation, information criteria, statistical tests, extent of association with other data, entropy, replicability, no small classes, and domain-usefulness.
This approach involves using only a subset of the data for each subject (or whatever other unit of analysis is used) when fitting a model for a specified number of classes, and then computing some measure of fit (e.g., log-likelihood) of the fitted model with the observations not used in estimation. This is repeated for different numbers of classes (e.g., from 1 to 10), and the number of classes with the best fit is selected. A variant of this, called K-fold cross validation instead operates by splitting the sample into K groups and estimating K models for the K-1 groups and judging them based on the fit for the K groups.
The strength of these approaches is that they rely on few assumptions. The weakness is that when there is little data, or only small differences between the number of classes, these approaches are not highly reliable.
Provided there are no technical errors, it should always be the case that the more classes you have, the better the classes will fit the data. At some point, however, adding the number of classes will overfit the data. Information criteria are heuristics that start with a computation of fit (the log-likelihood), and then penalize this based on the number of classes. Information criteria commonly applied to the selection of number of classes include the Bayesian information criterion (BIC), deviance information criterion, and the corrected-Akaike’s information criterion (CAIC).
Information criteria are easy to compute, but have, at best, weak theoretical support when applied to latent class analysis.
Statistical tests, such as likelihood ratio tests, can also be used to compare different number of clusters, where the difference distribution is bootstrapped.
Practical problems with this approach included that it is highly computationally intensive, that software is not widely available, and that the local optima that inevitably occur mean that the bootstrapped likelihood ratio test will inevitably be highly biased.
Extent of association with other data
This approach involves assessing the extent to which each number of classes solution (i.e., the two-class solution, the three-class solution, etc.) are associated with other data. The basic idea is that the stronger the association with other data, the greater the likelihood that the solution is valid, rather than just reflecting noise.
A practical challenge with this approach is that any truly novel and interesting finding is one that does not relate strongly to existing classifications.
An output from latent class analysis is an estimate of the probability that each subject (e.g., person) is in each of the classes. This data can be summarized into a single number, called entropy, which takes a value of 1 when all respondents have a probability of 1 of being in one class, and value of 0 when the probabilities of being assigned to a class are constant for all subjects. (Sometimes this is scaled in reverse, where 0 indicates all respondents have a probability of 1 and 1 indicates constant probabilities.)
The main role of entropy is to rule out the number of classes when the entropy is too low (e.g., less than 0.8). The evidence in favor of selecting the number of classes based on entropy is weak.
Replicability is computed by either randomly sampling with replacement (bootstrap replication) or splitting a sample into two groups. Latent class analysis is conducted in the replication samples. The number of classes which gets the most consistent results (i.e., consistent between the samples) is considered to be the best. This approach can also be viewed as a form of cross-validation.
Two challenges with this approach are that local optima may be more replicable than global optima (i.e., it may be easier to replicate a poor solution than a better solution), and that replicability declines based on the number of classes, all else being equal.
No small classes
The basic idea of this approach is that you choose the highest number of classes, such that none of the classes are small (e.g., less than 5% of the sample). This rule has long been used in practice as a part of the idea of domain-usefulness but has recently been discovered also to have some theoretical justification (Nasserinejad, K., van Rosmalen, J., de Kort, W., Lesaffre, E. (2017) Comparison of criteria for choosing the number of classes in Bayesian finite mixture models. PloS one, 12).
A weakness of this approach is the difficulty of justifying the choice of cutoff value.
Perhaps the most widely used approach is to choose the number of classes that creates the solution that is, to the analyst, most interesting.