We’re all reimagining how we do things, developing new skills, pitching-in, and figuring stuff out. When it comes to market research and data analysis, we can help fast track your survey analysis skills.
We’re all reimagining how we do things, developing new skills, pitching-in, and figuring stuff out. When it comes to market research and data analysis, we can help fast track your survey analysis skills.
Here’s a little summary of some of the subjects we cover in this webinar
If you’re:
- then let us show you how to analyze survey data!
I'm going to walk you through the basics of how to analyze a survey. We've put this together for complete novices of this topic, so if you've never gone through all the step in analyzing a survey on your own, this webinar is for you.
I will take you through eight stages of analyzing a survey.
Case study
And, we will do it using a case study, where we explore how likely people were to buy this product concept. Please take a moment to read it.
Getting the right type of data file
The first stage in analyzing a survey is getting data in the right format. This is the first big mistake that people make when they analyze surveys. They click the export button in their data collection software and get an Excel or CSV file. And they try and analyze this data. But Excel and CSV files weren't invented for survey analysis. Yes, you can use them, but you will double the time it takes to do analysis and you will likely make lots of mistakes.
There are three great data file formats specifically designed for surveys:
What marginally OK data looks like
But, let's say you just can't get one of these better data files, and need to use Excel or CSV files. You then often need to spend a bit of time tidying it up before you import it.
You need
What terrible data looks like
Terrible data is data that looks different to what I just described. If you try and import data like what's on the screen now, you will end up with lots of problems.
What impossible-to-analyze data looks like
And, if your data is already tabulated, you're basically stuffed, and you will just get error messages if you try and import tables like this.
You need what's called the raw data. That is, data with 1 row for each respondent.
Cleaning and tidying by question
I will import an SPSS .sav file now.
In Displayr:
Add data set
When we import the data, it is represented as variables or sets of variables. Let's take a look at one.
Most data files will have a variable in them called ID or something similar. It's the unique code associated with each person that does the survey. For example, the first respondent's code is what's shown here. This data is not so interesting, so let's look at the next variable. This shows how long, in seconds each interview took. Rather than hover over them one by one, I will start by running of a report that summarizes all the data.
In Displayr:
Insert > Report > Short Report
When you are doing cleaning and tidying you always want tables rather than charts
Duration (in seconds)
This is how long the questionnaire took to complete, on average, in seconds. That's an average of a bit over 10 minutes. We want to look at the minimum.
In Displayr:
Statistics > Cells : Min
So, the fastest person did the survey in a little under 4 minutes. That's plausible. If the numbers were implausible, we would need to delete the data with the implausible values.
User language
We've got data on user language. It's showing us the raw data rather than a summary table.
This is because whoever created the data file, set it up to show this data as if it was text rather than categories. We can change this to instead show a table of percentages
In Displayr:
Data Manipulation > Percentages
So, 100% of people doing the survey speak English
Gender
Note that Displayr's showing percentages by default. People that are new to survey analysis often want to show how many people chose each option.
In Displayr:
STATISTICS > CELLS > Count
STATISTICS > CELLS > %
This is usually the wrong thing to do. This survey is from a study of American adults. Knowing that 166 of them were female isn't interesting.
In Displayr:
STATISTICS > CELLS > %
STATISTICS > CELLS > Count
But now we are saying that according to our survey, 55% of adults in America are Female. If correct, that's a useful thing to know. This is the goal of surveys. To estimate things about the world outside of the survey itself.
We will return to gender a bit later.
Age
The survey was only asked to adults, so this first category isn't interesting. Tidying in this case means removing under 18s.
In Displayr:
Data Manipulation > Hide
State
It's usually better to look at this data as a map.
In Displayr:
Insert > Visualization > Geographic Map
Copy table
Paste into Output in 'Pages' field
Population density
The bottom category's pretty small.
In Displayr:
STATISTICS > CELLS > Count
Only 9 people. That's too small for useful analysis. We need to merge the bottom two categories.
In Displayr:
Drag bottom onto second bottom
Data Manipulation > Rename
Edit to: Less than 10,000 people
Education
We will merge these too:
In Displayr:
Drag 1st category onto second
Data Manipulation > Rename
Edit to: Never attended college
Race
We will merge the smaller categories
In Displayr:
Select bottom four categories other than NET
Data Manipulation > Merge
Edit to: Other
And, these other labels are too long and will make our report messy
Rename
Income
Let me first rename the variable.
In Displayr:
Click on Data Sets > In which…
GENERAL > Label <> Income
Page title: Income
We've got a lot of income categories. One option is to merge them, but a better option is to treat the data as being numeric.
In Displayr:
Click the Average button.
It's showing me an average income of 17.7. That doesn't make sense! We need to look at the data values to better understand
In Displayr:
Press DATA VALUES > Values
Ah, the way the data has been set up, an income of less than 1000 is a 1, 1000 to 2999 is 2, and so on. What we can do is replace these values with midpoints. For example:
1 -> 500
2 -> 2000
This is called midpoint recoding.We can do this automatically.
In Displayr:
Click percentages to convert back
TRANSFORMATIONS > Midpoint Coding and Quantification
Drag Income - RECODED onto page
Appearance > $
That makes more sense.
What, if anything … like
You will recall we showed a description of an iLock
In Displayr:
Click on Case Study
Go pack to What, if anything…
We asked them to say, in their own words, what they liked.
When we have text data, we need to categorize this into groups, so that we can then summarize the data like any other data. In survey research, this is often called coding.
In Displayr:
Insert > Text Analysis > Manual > Multiple overlapping .. > New
OK, so the first response is garbage. I will create a category to store poor quality data, as we will want to delete these respondents later.
In Displayr:
Rename category as Poor quality data
Select new category
Categorize as
We've got 109 people that said Nothing
In Displayr:
Right click on Poor Quality data
Add category
"Nothing"
The basic idea is that you read through the responses and categorize them by judgment.
Now, I won't bore you by making you watch. I did it earlier, and I'll load it now.
In Displayr:
Import > Resources - Documents\Data\Concept Test > iLock Likes.Qcodes
In our webinar on text analysis, which you can find on our website, you will find lots of ways to automatically analyze text data like this. Now, this causes new variables to be added to the data set. I'll give you a moment to read what people liked about the iLock Now, if you look at 3rd row from the bottom of the table, you can see that 5% provided poor quality data.
To get a better idea of what that means I'm going to filter the raw text, so it only shows the poor quality data.
In Displayr:
Inputs > FILTERS & WEIGHT > New
Data: What, if anything,
I've chosen the data we just created.
In Displayr:
Click on Poor quality data
Click Create Filter.
So, we asked them what they liked about the product. And, the 16 people basically told us junk.
If you think about how surveys work, all the previous questions just asked people to choose options. We have no way of checking if they chose sensibly. This is the first opportunity to see if the people are doing a good job answering, and these 16 people haven't. So, the right thing to do is to delete all their data. If they have given us garbage here, we can't rely on anything they've said.
In Displayr:
Click on Data Sets > iLock.sav
Unique Identifier: Response ID
Delete observations
This will delete the 16 rows of data. It's not permanent. We can undo it later if we need to.
Note over here on the right of the screen, it tells us that the data set contains 300 cases. But, look at the table. It's now based on 284 cases. All the other analyses in this document have been automatically updated to remove these 16 people with bad data.
What, if anything, do you … Dislike
Here we have asked about dislikes.
Title: Dislikes
As with the other text data, we need to code it.
In Displayr:
Click on What, if anything… dislike
Insert > Text Analysis > Manual > Multiple overlapping .. > New
As with before, I've already done it.
In Displayr:
Import - iLock.Dislikes > Save categories
Drag across data
This time it shows 0% with poor quality data. But, in surveys, you need to be a bit carefully when you see 0%, as there can still be people.
In Displayr:
Statistics > Cells > Count
Ok, so one person with poor quality data. Let's have a look at their data.
In Displayr:
Click raw data table
FILTERS & WEIGHTs > NEW
Data: Choose second What if anything
We'll need to give this a unique label
Label: Poor quality data - Dislikes
Created filter
So much for them all being English speaking! I will delete this person as well.
In Displayr:
Click on Data Sets > iLock.sav
Delete observations
Filter: … - Dislikes
Which phrase
This is usually called Purchase intention
I'll change the name of the page and the underlying data to match this
In Displayr:
Title: Purchase intention
Variable > Purchase intent
This data is clean and tidy. Nothing to do.
Compared with similar
This is often called uniqueness
In Displayr:
Title: Uniqueness
This data is clean and tidy. Nothing to do. We will return to this table later.
How well…
This is often called Brand fit.
There are too many categories. We should merge some of these categories.
How likely …
This is priced purchase intent. Nothing to do here.
Browser meta info - Browser
This tells us what type of browser they were using. Let's look at this as percentages:
In Displayr:
Data Sets > Browser Meta … > Data Manipulation > Percentages
Techniques for cleaning
We've just gone through the process of cleaning and tidying. This page summarizes what we just did.
Weighting
Next comes weighting. It's also known as sample balancing, raking, and post stratification. The basic idea here is that, in a survey you will often end up underrepresenting some groups in the population.
I've looked at the census data for gender and age, so will just look at them. When I compare this to the Census, I see we have a few too many females. We should have 51%, not 56%.
Too many 18 to 24s, and too few people aged 55 to 64. But, none of the differences are huge, so we don't need to weight it. If you do want to know how to weight, we've got both an eBook and a webinar on it.
Filtering
Filtering is the process of running analyses on only a subset of the data. You will remember we earlier filtered our text data to look at the low-quality responses. We will do more filtering later.
Overview - Planned analyses
This next topic is the thing that really separates out expert commercial survey researchers from the rest. Well before you look at your data, you need to very carefully identify the key things you need to work out. What novices do instead is they write a questionnaire but don't ever take the time to work through how they are going to analyze it, and this causes trouble when it comes time to do the analyses.
The specific plan that you will have depends entirely on what you are interested in. There's no standard plan.
Analysis plan
Here's my analysis plan for this survey. I will work through it. The first thing is, is the concept viable?
11% have said they would definitely buy it. People tend to exaggerate how likely they are to buy things, so you need to compare this data to benchmarks typically. The benchmark I'm using for this survey is 25%. So, we are a long way behind benchmark.
Let's look at the other bits of data that we planned to look at. But, before I do this, note that we've got 44% of people saying they would definitely not buy.
If we are going to find opportunities to improve the product to make it more appealing, we need to focus on these three middle categories. These people that aren't definitive one way or another.
Here's our table from before of Dislikes. We're going to filter it now, and just look at the data of people that said they Probably would buy or Might or Might not buy, or Probably wouldn't buy.
In Displayr:
New filter
Data > Purchase intent
Choose middle three boxes
Create filter
What we want to see is one big category of dislikes, as then we know what we need to focus on. The two biggest dislikes are the brand apple and concerns about Security. But, they're both pretty niche.
Let's look at uniqueness. Most people are viewing it as somewhat different. So, the problem isn't that it's perceived as a "me too" product.
Only a few people are thinking it fits poorly with Apple. So that's not the problem.
Crosstabs
The most used tool in survey analysis is the crosstab. I'm going to explain it by reference to filtering.
Let's say we wanted to understand if purchase intent differs by gender. We can do a filter. Let's filter for men.
In Displayr:
Filter for men
Now, let's filter for women.
In Displayr:
Duplicate
Remove filter
Filter for women.
Looking at this we can see that the females are a bit more likely to say Definitely will buy, with a score of 13% versus 10%. Now, when you are wanting to look at surveys, you are always wanting to do analyses like these, so we need a faster way than filtering.
This is called a Crosstab. Each column has a separate filter, in this case based on the categories in Gender. You can see it says Column % in the table. This is to remind us that the filters are in the columns.
The next question we have to ask is this: how meaningful is the difference between the 10% purchase intention for men versus the 13% for women? Is the difference reliable? Or, is it just a fluke? Fortunately, this is a topic that the whole discipline of statistics has focused on solving.
The arrows are telling us whether the differences between the filter groups are reliable enough to tell other people about. There’re no arrows in the first row, so we can't conclude a difference between the I would definitely buy it scores of the men versus the women. Yes, we do get what's called a significant difference in the third row, but as that row's not very interesting, this significant difference is immaterial.
When you do surveys, you tend to have to do lots of analyses like these. So, we can automate the process further. I'm going to automatically create crosstabs comparing purchase intent by all the demographics.
In Displayr:
Insert > More > Tables > Lots of crosstabs
Rows: How likely
Columns: Gender … Income RECODED
Here's the difference by gender we saw before. That's more interesting. Purchase intention is strongly related to age. Note here that I'm looking at the first number in each cell, which is the Column %. 25% for the 18 to 24s. All the way down to 0% for the 5 or older.
We've got a significant difference in Alabama. But, there's only 5 people in Alabama in the study, so I'm going to ignore it
There's no difference by population density
In the all important I would definitely buy it, there's no difference by education
There is a higher purchase interest among Black people. Possibly also among Asians, but as we only have 15 of them in the sample, we need to be quite cautious. Even with the black group, 37's pretty small as sample sizes go.
There's no arrows, so no difference by income.
Stat testing / statistical significance
We've just done stat testing. Now, we move onto finding the story.
Finding the story
The pope of the day asked Michelangelo how he'd carved this most famous of all statues. He said. "It's simple. I just remove everything that's not David."
This is also the key principle of doing useful analysis and reporting. We just go and delete everything that's not interesting.
We want to structure the information so that the key bit is at the very beginning. Then, supporting information, and then, more detail. Like a pyramid.
And, gloss it up a bit
This is a mix of the key conclusion and some supporting material, so we need to pull it apart.
In Displayr:
Go to Case Study
Title: The iLock's a Loser
Draw box over concept
Line width: 0
Color: fa614b / 66
Box over top 11%
That's about 11%
Line width: 0
Color: fa614b
And, we need to make the age pattern we found before clearer.
In Displayr:
Chart > Stacked column chart
Appearance > Highlight > No
“The younger somebody is, the more interested they are in buying (the market will grow).”
If we were brave and strong, we would be like Michelangelo and delete everything else. It's just rubble. But, if a little less brave, we can go with the pyramid.
In Displayr:
Drag all the other outputs into it
Delete Recoded table
So, we're done. We can either create a dashboard. Or, export it to PowerPoint.
In Displayr:
Select report. Export > PowerPoint
Read more