Your Guide for Working with Excel Files in Displayr
If you can export an SPSS file from your survey platform to use in your analysis, then Displayr will usually be able to do all of the data setup for you. However, if you end up with an Excel, file you may find yourself having to do a bit more work to get started with the data.
Why is that? The reason is that your SPSS files contain metadata that tells Displayr more about how your questions were asked in the survey and hence how Displayr should set them up.
In an Excel file (.xlsx or .csv), there is no metadata – all Displayr is able to see are the contents of each column. If you have some control over the contents of the Excel file then you can optimize your experience using that data in Displayr with a few simple guidelines.
This article covers two main areas:
- How to properly format an Excel-style file for Displayr
- How to set things up in Displayr after importing the data
For both phases, it is important that you have a copy of the questionnaire on hand so that you can be confident about changes that you make in Excel or Displayr.
Formatting Excel files for Displayr
The basic layout of your Excel file is as follows, and an example of what it should look like is included in the picture below:
- All of your data should be in one worksheet. Displayr won’t be able to use data in multiple worksheets.
- Each variable from your survey should have its own column (e.g., Age is in one column, Education is in another).
- The first row of the Excel file should have the names for each variable. Make sure these headings begin with letters or words, not numbers or symbols (#!?&), and that make sure these headings describe what is in each column (e.g., Age or Q1. Age rather than just Q1). This will make things a lot easier when the data is in Displayr.
- Each person’s responses should be contained in a separate row.
- Where possible, ensure that categories are stored as labels rather than numbers (e.g. use Male and Female instead of 1 and 2 in the column for the gender question).
- You should not have any merged cells in the table.
In the next section, I will explain the best way to arrange the contents of your columns for different types of data.
Setup for different data types
While Excel files do not have the metadata to tell Displayr how to set up the data, Displayr will try to infer the structure of your data based on what is in the columns and column headers. There are some conventions that you can use that will help reduce the amount of setup that you need to do once the file is imported.
When a column contains categories, it is almost always better that the column contains the labels of the categories rather than numbers which indicate the category values. The reason is that if the labels are not present, you will need to enter them later. If you’re stuck with numbers in your file, you can use Excel’s find-and-replace feature to replace values with labels within the individual columns or even groups of columns.
When you have a multiple response question, the data should be stored in the Excel file with one column for each of the possible options that were shown in the survey. Be sure to avoid storing multiple responses within a single column, as this will make the data very difficult to analyze in Displayr.
An example of the proper layout for a multiple response question is shown above in the last three columns. The columns headed as Q1a, Q1b, and Q1c represent three options that were shown in a multiple-response question. More tips for column headers for this type of data are considered below.
Missing data and ‘Don’t know’ categories
When a respondent has not given an answer to a question or skipped the question, then they should have an entry of NA in that cell. When working with categories, other kinds of responses which indicate the respondent saw the question but did not give an answer are generally best given a label of Don’t know or whatever phrasing matches your questionnaire.
Good conventions for column headers
Above I mentioned that your headers should always start with a letter rather than a number and that they should be as descriptive as possible. Here I’ll take a deeper look at some common cases where improving the headers will make a big difference.
Ditch multiple header rows
Some online survey providers will export an Excel or CSV file with two or more header rows describing the data. Here is one such example that actually contains three header rows:
The respondent data does not begin until the fourth row in the file. The first row contains short names, the second row contains the full question text, and the third row contains some additional tags.
Before using a file like this in Displayr you must reduce it to one header. In some cases, you may simply delete the rows that you don’t need. In other cases, you might like to use the Excel CONCATENATE() function to create a new header row. CONCATENATE() is used to paste together two or more pieces of text, and so could be used to combine the information in the second row with the short names in the first row. By using a formula like =CONCATENATE(R1, ” – “,R2) and a little clever copying-and-pasting, the headers can quickly take a more appropriate shape:
Depending on the nature of your file, some further tidying may be necessary, but a little trick like this can get you a long way.
Use headings to ensure your variables get grouped together
There is a range of survey question styles which collect multiple answers from each person. The most common examples include the traditional multiple response question, and single response and multiple response grids. In most cases in your workflow, you will want to have the data from such questions grouped together in Displayr. The column headers that you use can help Displayr to identify when a group of variables should be analyzed together. The basic idea is that the column headers should contain an identifiable structure which indicates when variables are meant to go together.
The screenshot in the first section of this article contains one such set of variables for a multiple response question. The column headers are Q1a, Q1b, and Q1c. The common prefix is enough to tell Displayr that these variables go together.
Consider the case shown below, where we have asked people what aspects they feel are important when deciding what food to get delivered for a night in. The labels only tell us which response option was shown and there is no info in the headers to indicate that the variables are related to one another. These variables would not get grouped together when the data is imported into Displayr.
A better layout for these variables is as follows. Here, the text from the question has been included in the headers, and this pattern will ensure that Displayr groups these variables together when the data is imported.
The common prefix in the headers, “How important are the following when ordering delivery food? – “, is enough to allow Q to recognize the structure of this data.
Using Excel files in Displayr
The tips above will help you get the best possible Excel file for Displayr. Even with a tidy file, you may need to complete some other setup once you have imported the file into Displayr. This includes entering or tidying labels, recoding numeric values, and grouping variables. All of these things are really easy to take care of in Displayr once you know how!
Importing and checking your data
To figure out what parts of your data need to be tidied, it is good to first import your data and then go through your questions one-by-one by creating individual tables and reviewing them to ensure they are setup correctly.
To import your data into Displayr, from the menus select using Home > New Data Set > My Computer, locate and select your file and then click Open. Displayr will attempt to automatically detect your data file structure and add the file to the Data Sets tree.
To create a table from a question, simply drag the question from the Data Sets tree onto the page. Displayr will automatically create a table based on the structure of the question. Now I will go through some common issues you might encounter and how to address them.
Table shows an Average instead of categories
In this example, the Age question in my Excel file contained numbers for the age groups instead of labels. The table in Displayr is showing an Average of those numbers instead of showing the percentage of people in each age group.
To show categories instead of an Average you should:
- Select the Age variable in the Data Sets tree.
- From the menus, select Data Manipulation > Percentages (from the Structure menu group)
This process, called changing the Variable Set Structure, is one of the most fundamental things to learn about controlling your data in Displayr.
To learn more about the structure of data in Displayr, see Understanding Variable Sets in Displayr: A Tutorial.
Categories don’t have labels
If you have columns with numbers representing categories, like the table above, you will want to add category labels. Labels for categories can easily be added in Displayr.
To enter labels:
- Select the variable from the Data Sets tree.
- From the Properties on the right, click the Labels button.
- Enter the appropriate labels for each of the values.
- Click OK.
Recoding numeric values
When you first import your Excel file, Displayr will assign numeric values for any categories that it identifies. This is, importantly, different from using SPSS files and other files with metadata, where categories are stored with both a label and a numeric value.
For some categorical data, like Gender, the numeric values are relatively unimportant. However, if your data contains a scale, like satisfaction ratings, then the numeric values can be useful to your analysis in computing an average for the scale among different groups of respondents.
To check and change the numeric values that have been assigned
- Select the variable from the Data Sets tree.
- From the Properties on the right, click the Values button.
- Change the entries in the Value column as needed.
- Click OK.
In this example, I have a 5-point scale where the categories have been coded as:
- Dislike = 1
- Hate = 2
- Like = 3
- Love = 4
- Neither like nor dislike = 5
This is not a useful set of values for this data. I can recode the categories by simply changing the entries in the Value column as follows:x
- Hate = -2
- Dislike = -1
- Neither like nor dislike = 0
- Like = 1
- Love = 2
Tidying variable labels
Variable labels come from the column headers in your Excel file. These labels will affect the way items appear in tables, particularly for multiple response questions, and other types of questions that contain multiple variables (like grids). You can access variable labels by right-clicking on a variable in the Data Sets tree and selecting rename. From here you can enter whatever label you want for that variable.
Data appears as Text
When Displayr sees a column full of text in the Excel file it needs to make a decision about whether the text in the column represents a set of category labels or it represents open-ended responses. Displayr makes an educated guess based on the number of unique responses in the column, and the lengths of the responses. Sometimes Displayr can guess incorrectly, particularly if, for example, you have a really large brand list.
If Displayr has not identified your data as being categorical, the table will show all of the responses. To create a new copy of the data with each unique response turned into a category you should:
- Select the text variable from the Data Sets tree (in this example it is Preferred Cola).
- In the Properties on the right, change the Structure drop-down to Nominal: Mutually exclusive categories. A new categorical variable will be created.
- Drag the new categorical variable onto the page to create a table showing response percentages.
If variables are meant to be grouped, as in a multiple response question or grid, you can group them as follows:
- Hold down your Ctrl key and from the Data Sets tree, select each of the variables that you want to group.
- Select Data Manipulation > Combine (from the Variables menu group). The variables will be grouped together into a single question.
- Expand the combined question in the Data Sets tree to view the individual variables.
In the following example, I have rating scales for several soft drink brands. Each is initially setup as Nominal: Mutually exclusive categories. I group them as a Nominal - Multi (so that the brand ratings can be shown together).
Time to get started
If you follow the steps above to preparing your Excel or CSV file for use in Displayr you will be ready to get started! These specs should take care of just about everything, and any remaining issues can easily be addressed in Displayr.