How to Split Text Strings in Q
In this post we consider a simple example, where people’s responses to an awareness question on soft drinks have been store in a single variable, with commas separating the 1st mention, 2nd mention, and so on. The raw data looks like this:
Respondent #16 has three responses to the awareness question. Splitting that response by a comma produces three separate bits of information, which can then be stored separately or processed.
- Choose Text if you want the output to be another text variable, or Numeric if you want the output variable to have numeric values.
- Enter your Expression and click OK.
To create a new variable which has the 1st mention for each respondent we would use the following expression:
var string = awareness; var split_string = string.split(','); split_string;
The Expression and Preview of results show us how the formula is working:
Splitting Text Strings with R
To split data in R, the strsplit() function is used. The syntax for this is:
where x represents the vector or string you are looking to split and split denotes the character or expression you want to use as the separator. Running this function will produce the split strings as a list.
To generate the R output in Q, we must generate a new R variable. This can be done by following the Create > Variables and Questions > Variable(s) > R Variable path from the menu bar or right clicking in the Variables and Questions tab and choosing Insert Variable(s) > R Variable.
Now in the Edit R Variable window, give your new variable a name in the Question Name field at the bottom of the window and build your R code in the R CODE section.
To build R code that will split the text into strings, you must first point to the variable whose text you want to split and insert your splitting parameters.
Splitdata = strsplit(TextVar1, ' ')
This produces an object in R called a list. Q cannot interpret lists as variables. The list must be converted into a data frame or matrix.
When the text strings contain an identical number of segments (for example, timestamps in the form hh:mm:ss always have 3 segments), you can use the rbind function to arrange them into columns all of them at once:
When the text strings do not contain an identical number of segments, as in our example above, it takes a bit more work to organize the data as a set of columns. The following text works out the maximum number of elements present among the text strings and applies that length to all text strings.
x = strsplit(Splitdata, ' ') #get max length n = max(sapply(x, length)) for (j in 1:length(x)) length(x [[j]]) = n z = do.call(rbind, x) z
This output shows our data split and organized into columns, but inserts “NA” in the fields where no data is present. We can clean this up by adding one additional line of code in the second-last line:
x = strsplit(awareness, ',') #get max length n = max(sapply(x, length)) for (j in 1:length(x)) length(x [[j]]) = n z = do.call(rbind, x) z[is.na(z)] = '' # Replace NAs with blanks z
Press the Play button to verify your output. Once verified, click the Add R variable button to complete the process.
Finally, the labels of the new variables can be generated by using, the colnames() function to name the columns of the data frame:
x = strsplit(awareness, ',') #get max length n = max(sapply(x, length)) for (j in 1:length(x)) length(x [[j]]) = n z = do.call(rbind, x) z[is.na(z)] = '' # Replace NAs with blanks colnames(z) = paste0('Mention: ',1:ncol(z)) z
We hope you found this article helpful! To discover how you can do more in Q, head on over to Using Q!