Missing values in a data set are "blank" values. They are normally associated with survey skips. Those who skip a variable/question receive a missing value. Missing values are necessary to base a question appropriately. Sometimes you set certain non-missing values to be treated as missing (a typical case is "Don't Know" responses). Sometimes, however, you want to go the other way and actually include missing values in the base. So although you may have a survey skip in the questionnaire, you now want to analyse the results of a question to include others who may have skipped it.

Simple and complicated recoding of missing values

If you choose to do this, it could be a simple case of recoding the missing value into another single value that is considered non-missing for the sake of analysis. Or, it could need more complicated recoding, because you want to allocate the missing values into two or more codes.

A single recode scenario: You ask respondents which brands they would consider purchasing, but you only show the brands they said they were aware of earlier in the questionnaire. However, when it comes to reporting time, you decide you actually want the proportion of “considerers” among the total base (rather than just those who are aware of the brand).

A multiple recode scenario: You have a variable that indicates which segment respondents belong to, but a proportion of that sample have missing values on the segmentation membership variable. Now you want to complete the segmentation membership variable in the data set (so everyone has a segment). In doing so, you need to create an algorithm that decides which segment each respondent should belong to.

There are several ways to recode missing values in Q. I'll look here specifically at the two scenarios above. There are other scenarios in which you may want to look at recoding missing values, such as mean substitution for numeric variables (which we do not cover in this post).

How to do a simple recode in Q for a single result

Recoding the missing values for a question into a single other value, is just a couple of clicks.

  • (optional): Duplicate the question first (and then work with the duplicate)
  • Put the question you’re going to change into the blue drop-down, with "SUMMARY" in the brown drop-down
  • Right-click on the question (rows or columns) and then go to the Values
  • In the values, toggle off the box for the Missing Values

Consider the table below. The base is 452, representing those who said they were working in preceding question. The total base of the data set is 500. By going into the Values of this question, we see this:

Values attributes box

By toggling off the check for missing data, and changing the label from “Missing Data” to “Not working”, then the question displays as the below:

Tables with missing values after recoding missing values

Here's another example showing a grid situation. Consider the Pick Any - Grid question below. The base for each of the cells is different. Respondents are only asked about supermarkets they are familiar with.

Grid with varying base

In the question’s values, deselect the Missing data box, so that "Missing Data" and "Not selected" are included in the base:

Binary missing values

Then, each cell in the above example grid rebases to 500 – the total sample in the survey.

How to do a recode in Q for multiple possible results (using JavaScript)

When there are multiple results, things are slightly more complicated than the above. Provided you have information about what these missing values should become, you can create a new variable with all the information you need. Consider the example below, where there is incomplete data on a segmentation variable.

Segment variable attributes box

The cases with missing data could be allocated to A,B,C, or D - or perhaps a new segment entirely. To fix this, we’ll make a new variable that follows a series of rules to allocate each case to a category. If the respondent already belongs to a segment, then they will stay in that segment. If they are without a segment, then we'll use other variables to figure out which segment they should be in.

To do this, we’ll make a JavaScript variable. On the Q Wiki we have a comprehensive guide to JavaScript variables and the use of simple JavaScript (including a video). In essence, the key steps are:

  • Go to the Variables and Questions tab, right-click > Insert Variable > JavaScript Variable > Numeric
  • Write an expression (in JavaScript)
  • Check it’s working (in the preview panel)
  • Push OK (to generate the new variable).

The tricky bit is knowing what the expression is going to be. In this worked example, I would use code like the following:

if (!isNaN(segment)) segment;
else if (Q3 == 1 && Q4 <= 5) 1;
else if (Q3 == 1 && Q4 >= 6) 2;
else if (Q3 == 2 && Q4 <= 5) 3;
else if (Q3 == 2 && Q4 >= 6) 4;

Let me break it down for you. The first line says: “if the segment variable is NOT a missing value, then just keep the segmentation variable’s value”. The key function here is  isNaN(variable) which asks the question “is this variable a missing value” and returns TRUE or FALSE. NaN means “Not a Number” in JavaScript. But, crucially, it is the addition of the exclamation mark (!) before the function that reverses it. So it’s asking “is this variable NOT a missing value?”. So, if that is true, then we want to keep whatever the original segmentation value was.

The next line is then saying “OK, well, for the rest of you (who must logically have missing values in the variable segment), if you happen to be male and under the age of 35, then you will get a ‘1’.”  Each of the subsequent lines follow on with the same logic. Obviously in this example the segments align to a demographic profile but of course it could be anything. You may have more complicated IF statements involving other Boolean operands (OR and NOT and several variables, etc). Check out our guide to IF and ELSE IF statements to help you write the statements from line 2 downwards.

Whenever you make a JavaScript variable, as noted in the video, be sure to check the Preview panel at the bottom. If you click Collapse duplicate inputs, you'll be able to see more clearly if the different combination of input variables involved are generating the right output for you. If you look at the 5th and 10th line in the below, these cases had missing (NaN) in the original variable but the code is correctly allocating them to a segment based the rules above.

Resulting variable

The resultant variable will be numeric (in type). You need to turn it into a categorical variable and check the Values (re-labeling anything as appropriate).

See for yourself

The above questions are contained in this QPack, for you to work with.

Want to do more with Q or need some help? Book a personalized demo now!