Tips for Recoding Missing Values in Q
Missing values in a data set are "blank" values. They are normally associated with survey skips. Those who skip a variable/question receive a missing value. Missing values are necessary to base a question appropriately. Sometimes you set certain non-missing values to be treated as missing (a typical case is "Don't Know" responses). Sometimes, however, you want to go the other way and actually include missing values in the base. So although you may have a survey skip in the questionnaire, you now want to analyse the results of a question to include others who may have skipped it.
Simple and complicated recoding of missing values
If you choose to do this, it could be a simple case of recoding the missing value into another single value that is considered non-missing for the sake of analysis. Or, it could need more complicated recoding, because you want to allocate the missing values into two or more codes.
A single recode scenario: You ask respondents which brands they would consider purchasing, but you only show the brands they said they were aware of earlier in the questionnaire. However, when it comes to reporting time, you decide you actually want the proportion of “considerers” among the total base (rather than just those who are aware of the brand).
A multiple recode scenario: You have a variable that indicates which segment respondents belong to, but a proportion of that sample have missing values on the segmentation membership variable. Now you want to complete the segmentation membership variable in the data set (so everyone has a segment). In doing so, you need to create an algorithm that decides which segment each respondent should belong to.
There are several ways to recode missing values in Q. I'll look here specifically at the two scenarios above. There are other scenarios in which you may want to look at recoding missing values, such as mean substitution for numeric variables (which we do not cover in this post).
How to do a simple recode in Q for a single result
Recoding the missing values for a question into a single other value, is just a couple of clicks.
- (optional): Duplicate the question first (and then work with the duplicate)
- Put the question you’re going to change into the blue drop-down, with "SUMMARY" in the brown drop-down
- Right-click on the question (rows or columns) and then go to the Values
- In the values, toggle off the box for the Missing Values
Consider the table below. The base is 452, representing those who said they were working in preceding question. The total base of the data set is 500. By going into the Values of this question, we see this:
By toggling off the check for missing data, and changing the label from “Missing Data” to “Not working”, then the question displays as the below:
Here's another example showing a grid situation. Consider the Pick Any - Grid question below. The base for each of the cells is different. Respondents are only asked about supermarkets they are familiar with.
In the question’s values, deselect the Missing data box, so that "Missing Data" and "Not selected" are included in the base:
Then, each cell in the above example grid rebases to 500 – the total sample in the survey.
When there are multiple results, things are slightly more complicated than the above. Provided you have information about what these missing values should become, you can create a new variable with all the information you need. Consider the example below, where there is incomplete data on a segmentation variable.
The cases with missing data could be allocated to A,B,C, or D - or perhaps a new segment entirely. To fix this, we’ll make a new variable that follows a series of rules to allocate each case to a category. If the respondent already belongs to a segment, then they will stay in that segment. If they are without a segment, then we'll use other variables to figure out which segment they should be in.
- Check it’s working (in the preview panel)
- Push OK (to generate the new variable).
The tricky bit is knowing what the expression is going to be. In this worked example, I would use code like the following:
if (!isNaN(segment)) segment; else if (Q3 == 1 && Q4 <= 5) 1; else if (Q3 == 1 && Q4 >= 6) 2; else if (Q3 == 2 && Q4 <= 5) 3; else if (Q3 == 2 && Q4 >= 6) 4;
The next line is then saying “OK, well, for the rest of you (who must logically have missing values in the variable segment), if you happen to be male and under the age of 35, then you will get a ‘1’.” Each of the subsequent lines follow on with the same logic. Obviously in this example the segments align to a demographic profile but of course it could be anything. You may have more complicated IF statements involving other Boolean operands (OR and NOT and several variables, etc). Check out our guide to IF and ELSE IF statements to help you write the statements from line 2 downwards.
The resultant variable will be numeric (in type). You need to turn it into a categorical variable and check the Values (re-labeling anything as appropriate).
See for yourself
The above questions are contained in this QPack, for you to work with.
Want to do more with Q or need some help? Book a personalized demo now!