The effective sample size is an estimate of the sample size required to achieve the same level of precision if that sample was a simple random sample. Mathematically, it is defined as n/D, where n is the sample size and D is the design effect. It is used as a way of summarizing the amount of information in data. It has three main areas of application: survey analysis, time series analysis, and Bayesian statistics (MCMC).

How effective sample size is used

The main application of effective sample size calculations is for qualitative assessments of the sample size. For example, if it is believed that a sample size of 30 is required for an analysis to be valid, then the effective sample size – rather than the actual sample size – is used in such an assessment.

Sometimes effective sample sizes are used as an input into statistical calculations in place of the actual sample size. This practice is better than using the actual sample size but is only a rough heuristic. (In general, a better approach is to use statistical techniques specifically designed for non-simple random samples, such as complex samples regression.)

Effective sample size in surveys

In survey analysis, the way that a survey is designed affects the precision of survey estimates (i.e., the standard error of statistics). Stratification, clustering, and weighting all usually increase the standard errors of estimates in real-world surveys.

Most commonly, effective sample size is used as a way of quantifying the effect of weighting a survey. For example, if a survey of 1,000 people has an effective sample size for a statistic of 500, it means that the amount of sampling error is equivalent to that which would have been obtained by a study of 500 people that did not need to be weighted.

A common misunderstanding in survey analysis is that a survey has an effective sample size. This is rarely the case. Most statistics that are calculated have their own effective sample size. For example, if you compute the effective sample size for the average of one variable, it will typically be different from the effective sample size computed for another variable. Where all the statistics have the same effective sample size, it means that an approximation of some kind has been used.

Effective sample size in time series analysis

When autocorrelation exists in a time series, this also reduces the effective sample size. For example, if the first order autocorrelation is 0.5, then the effective sample size of 100 observations is only 33 observations.

Effective sample size in Bayesian statistics (MCMC)

In Bayesian statistics, it is common to use the posterior draws from Markov chain Monte Carlo (MCMC) for statistical inference. The MCMC process causes the draws to be correlated, which means that the effective sample size is generally lower than the number of draws. For this reason, the effective sample size – rather than the actual sample size – is typically used when determining if an MCMC model has converged.

Check out our other handy What is... guides!

Acknowledgments

The time series calculation is from https://imedea.uib-csic.es/master/cambioglobal/Modulo_V_cod101615/Theory/TSA_theory_part1.pdf.