How to Stack Data in Displayr Using R
Data stacking is a data preparation step where a data set is split into subsets, and the subsets are merged by case (or stacked on top of one another). The number of variables in the data decreases, and the number of cases increases. This is sometimes referred to as converting data from a wide format to a long format. This article describes how to create a stacked data set in Displayr using R code.
Create an R Data Set
The R data set option in Displayr allows you to process and modify a data set with R code before the data enters your document. This makes it the option of choice when you have a data set that you want to stack (or otherwise restructure) before making use of the standard tools for data analysis that are available in Displayr.
To add an R data set to your document:
- Select Home > + Data Set.
- Select the R
- Enter the code needed for sourcing and stacking your data. To try it out, you can paste in the code from the bottom of this post.
- Enter a Name for the Data Set.
- Click OK.
Step number 3 is where all the key details lie. The rest of this post discusses how to think about your data stacking and how to set up the code.
A worked example
In this example we consider data from a simple survey which asked people about how they rate different technology brands. The code that we used is available below. The following image shows a subset of the data before and after stacking.
When stacking a data frame in R you should start by working out four key pieces of information:
- Which variables are to be stacked? Below, we will construct a list to describe the sets of variables to be stacked.
- Which variable is the ID variable?
- Which other variables in the data should be included but not stacked. These will be stretched, which means that their values will be repeated several times for each of the original cases.
- Which variables do you want to exclude from the new data? By a process of elimination this is everything not covered in the first three points.
The R code
An example of the kind of code you can use to bring in an unstacked data file and stack it is the following:
# Reading in the data library(foreign) tech = suppressWarnings(read.spss("https://wiki.q-researchsoftware.com/images/3/35/Technology_2018.sav", use.value.labels = TRUE, to.data.frame = TRUE)) # Stacking the data id.variable = 'RESPNUM' variables.to.stretch = c('Q1', 'Rec_Age') variables.to.stack = list( 'Recommend' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05', 'Q3_06','Q3_07','Q3_08','Q3_09','Q3_10','Q3_11', 'Q3_12','Q3_13'), 'Fun' = c('Q4a_01','Q4a_02','Q4a_03','Q4a_04','Q4a_05','Q4a_06', 'Q4a_07','Q4a_08','Q4a_09','Q4a_10','Q4a_11','Q4a_12','Q4a_13'), 'Worth what you pay for' = c('Q4b_01','Q4b_02','Q4b_03','Q4b_04','Q4b_05','Q4b_06', 'Q4b_07','Q4b_08','Q4b_09','Q4b_10','Q4b_11','Q4b_12','Q4b_13'), 'Innovative' = c('Q4c_01','Q4c_02','Q4c_03','Q4c_04','Q4c_05','Q4c_06', 'Q4c_07','Q4c_08','Q4c_09','Q4c_10','Q4c_11','Q4c_12','Q4c_13'), 'Good customer service' = c('Q4d_01','Q4d_02','Q4d_03','Q4d_04','Q4d_05','Q4d_06', 'Q4d_07','Q4d_08','Q4d_09','Q4d_10','Q4d_11','Q4d_12','Q4d_13'), 'Stylish' = c('Q4e_01','Q4e_02','Q4e_03','Q4e_04','Q4e_05','Q4e_06', 'Q4e_07','Q4e_08','Q4e_09','Q4e_10','Q4e_11','Q4e_12','Q4e_13'), 'Easy-to-use' = c('Q4f_01','Q4f_02','Q4f_03','Q4f_04','Q4f_05','Q4f_06', 'Q4f_07','Q4f_08','Q4f_09','Q4f_10','Q4f_11','Q4f_12','Q4f_13'), 'High quality' = c('Q4g_01','Q4g_02','Q4g_03','Q4g_04','Q4g_05','Q4g_06', 'Q4g_07','Q4g_08','Q4g_09','Q4g_10','Q4g_11','Q4g_12','Q4g_13'), 'High performance' = c('Q4h_01','Q4h_02','Q4h_03','Q4h_04','Q4h_05','Q4h_06', 'Q4h_07','Q4h_08','Q4h_09','Q4h_10','Q4h_11','Q4h_12','Q4h_13'), 'Low prices' = c('Q4i_01','Q4i_02','Q4i_03','Q4i_04','Q4i_05','Q4i_06', 'Q4i_07','Q4i_08','Q4i_09','Q4i_10','Q4i_11','Q4i_12','Q4i_13')) all.names <- names(tech) variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable, variables.to.stretch)] stacked.tech = reshape(data = tech, idvar = id.variable, direction = "long", drop = variables.to.exclude, varying = variables.to.stack) names(stacked.tech) = c(id.variable, variables.to.stretch, "brand", names(variables.to.stack)) stacked.tech
To begin, we import the data file from a URL using the read.spss function from the package called foreign. You can choose the function which is right for your data format, but your data set must be accessible online as Displayr does not have access to your local PC when running R calculations.
Next, set out which variables in the data contain the respondent IDs, which variables are to be included but not stacked (these will instead be stretched), and which variables are to be stacked. The trickiest part of this code is the list that describes which variables to stack. There are some important points to note:
- Each element of the list has a name to tell us what the variables mean. This will become the name of the variable in the stacked data frame.
- Each element of the list is a vector which tells us the variable names in the original data frame and the order in which they are to be stacked.
The complicated line of code beginning with variables.to.exclude works out the variables that are to be excluded by process of elimination.
We use the reshape function to create a stacked data frame, and the arguments to the reshape function tell it what to do with each of the columns. The direction argument tells the function that we want to stack the data rather than apply an alternative transformation. The final line of the code changes the column names of the stacked data frame to make them more meaningful, and this affects how the final data appears in Displayr.
Tips for working with R Data Sets
In order to successfully leverage the data-manipulation tools that are available in R to stack your data, there are a couple of key aspects to keep in mind:
- You need to host your data set on the web. R in Displayr does not have access to files on your local machine.
- If you are unsure of the exact code that you need to use to stack your data set you can prototype the code in an R Output by using Insert > R Output, typing in your R CODE, and clicking Calculate. The R Output will allow you to preview the results and modify your code as you go. You can then use that same code to add your R data set.
- R doesn’t have the same level of metadata as some file types, like SSS and SAV. For example, variables in an R data frame do not have the concept of both Name and Label. Remember to add such information after you have added your data set.
There's loads of other things you can do in Displayr using R. Find out what they are by heading over to R in Displayr!