­Rebasing involves modifying a calculation by changing the sample (base) used in the calculation. For example, if 40% of people say they will vote Democrat and 40% say Republican, and 20% say Don’t Know, when the data is rebased to exclude the Don’t Know responses, the result changes to 50% Democrat and 50% Republican. Rebasing is commonly performed to remove ambiguous responses from data and to adjust for screening criteria.

Rebasing to remove ambiguous responses

Sometimes some data is ambiguous. For example, the chart below shows the results from the 2017 Developer Survey by the question-and-answer site Stack Overflow. It shows that 7.6% of the sample identified as being female. A simple interpretation of this data is that 7.6% of respondents are female. However, this is not likely a correct interpretation. The data adds up to 98.8%, so another 1.2% have presumably either not responded or are from some other gender category that has not been shown.

Rebasing to remove ambiguous responses

The simplest way to rebase the numbers is to exclude this missing 1.2%. This changes the computed number of females to be 7.6% / (100% - 1.2%) = 7.7%. By rebasing the data in this way, we are implicitly assuming that the 1.2% have the same gender split as the rest of the sample.

We may also wish to rebase the data to include only people who identified as male or female, in which case the percentage of women becomes 7.7% / (88.6% + 7.6%) = 7.9%.

Rebasing to adjust for screening criteria

Consider a survey which has identified that among cola drinkers, the average cola consumption is 1.5 glasses per day. If 30% of people are cola drinkers (i.e., were screened into the survey), we can rebase this 1.5 to compute the average cola consumption across all people. The calculation is 1.5 * 30% = 0.45 colas per day.

The math of rebasing

When excluding a group from a calculation, rebasing involves dividing by the percentage of the sample that remains after the group is excluded. For example, if 40% of people say they will vote Democrat and 20% say they don’t know, we rebase by dividing the 40% by 100% - 20%, which gives 40% / 80% = 50%.

When we include a group that was excluded from the original calculation, we multiply by the percentage of people that were included. For example, in the earlier example we multiplied the average of 1.5 colas by the 30% representing the proportion of people who drink colas.

Sometime rebasing calculations can be more complicated and involve recalculating all the numbers using the original data. For example, if we wanted to rebase the percentage of people who said they would vote Democrat to reflect only people who had voted in the previous presidential election, we need to use the original raw data to perform the calculation. Although this last approach is sometimes called rebasing, it is more commonly known as filtering or subgroup analysis.