A Beginner’s Guide to Survey Analysis

We’re all reimagining how we do things, developing new skills, pitching-in, and figuring stuff out. When it comes to market research and data analysis, we can help fast track your survey analysis skills.

In this webinar you will learn

Here’s a little summary of some of the subjects we cover in this webinar

If you’re:

  • Starting out in the consumer insights or market research industry
  • Working in client services but have never done analysis yourself
  • Part of a marketing team that now has to DIY

- then let us show you how to analyze survey data!

Transcript

I'm going to walk you through the basics of how to analyze a survey. We've put this together for complete novices of this topic, so if you've never gone through all the step in analyzing a survey on your own, this webinar is for you.
I will take you through eight stages of analyzing a survey.

Case study

And, we will do it using a case study, where we explore how likely people were to buy this product concept. Please take a moment to read it.

 

Getting the right type of data file

The first stage in analyzing a survey is getting data in the right format. This is the first big mistake that people make when they analyze surveys. They click the export button in their data collection software and get an Excel or CSV file. And they try and analyze this data. But Excel and CSV files weren't invented for survey analysis. Yes, you can use them, but you will double the time it takes to do analysis and you will likely make lots of mistakes.

There are three great data file formats specifically designed for surveys:

  • The most well-known was invented many years ago by a division of IBM called SPSS. This file format is the industry standard and used by just about all survey analysis software.
  • These files are called SPSS data files or .sav files.
  • Two even better file formats are MDD and Triple S. They're a bit harder to get.
  • And, if you are using Displayr, you can do even better and import your data straight from SurveyMonkey and Qualtrics into Displayr.

 

What marginally OK data looks like

But, let's say you just can't get one of these better data files, and need to use Excel or CSV files. You then often need to spend a bit of time tidying it up before you import it.

You need

  1. Row 1 to show what the data means
  2. You want a unique ID for each respondent
  3. Each respondent's data should be in one and only 1 row
  4. Where you can, store the data numerically.
  5. If you have multi-pick questions, you want them set up with 1s and 0s, like shown here.

 

What terrible data looks like

Terrible data is data that looks different to what I just described. If you try and import data like what's on the screen now, you will end up with lots of problems.

 

What impossible-to-analyze data looks like

And, if your data is already tabulated, you're basically stuffed, and you will just get error messages if you try and import tables like this.

You need what's called the raw data. That is, data with 1 row for each respondent.

 

Cleaning and tidying by question

I will import an SPSS .sav file now.

 

In Displayr:

Add data set

When we import the data, it is represented as variables or sets of variables. Let's take a look at one.

 

Most data files will have a variable in them called ID or something similar. It's the unique code associated with each person that does the survey. For example, the first respondent's code is what's shown here. This data is not so interesting, so let's look at the next variable. This shows how long, in seconds each interview took.  Rather than hover over them one by one, I will start by running of a report that summarizes all the data.

 

In Displayr:

Insert > Report > Short Report

 

When you are doing cleaning and tidying you always want tables rather than charts

 

Duration (in seconds)

This is how long the questionnaire took to complete, on average, in seconds. That's an average of a bit over 10 minutes. We want to look at the minimum.

 

In Displayr:

Statistics > Cells : Min

 

So, the fastest person did the survey in a little under 4 minutes. That's plausible. If the numbers were implausible, we would need to delete the data with the implausible values.

 

User language

We've got data on user language. It's showing us the raw data rather than a summary table.

This is because whoever created the data file, set it up to show this data as if it was text rather than categories. We can change this to instead show a table of percentages

 

In Displayr:

Data Manipulation > Percentages

 

So, 100% of people doing the survey speak English

 

Gender

Note that Displayr's showing percentages by default. People that are new to survey analysis often want to show how many people chose each option.

 

In Displayr:

STATISTICS > CELLS > Count

 

STATISTICS > CELLS > %

 

This is usually the wrong thing to do. This survey is from a study of American adults. Knowing that 166 of them were female isn't interesting.

 

In Displayr:

STATISTICS > CELLS > %

STATISTICS > CELLS > Count

 

But now we are saying that according to our survey, 55% of adults in America are Female. If correct, that's a useful thing to know. This is the goal of surveys. To estimate things about the world outside of the survey itself.

We will return to gender a bit later.

 

Age

The survey was only asked to adults, so this first category isn't interesting. Tidying in this case means removing under 18s.

 

In Displayr:

Data Manipulation > Hide

 

State

It's usually better to look at this data as a map.

 

In Displayr:

Insert > Visualization > Geographic Map

Copy table

Paste into Output in 'Pages' field

 

Population density

The bottom category's pretty small.

 

In Displayr:

STATISTICS > CELLS > Count

 

Only 9 people. That's too small for useful analysis. We need to merge the bottom two categories.

 

In Displayr:

Drag bottom onto second bottom

Data Manipulation > Rename

Edit to: Less than 10,000 people

 

Education

We will merge these too:

In Displayr:

Drag 1st category onto second

Data Manipulation > Rename

Edit to: Never attended college

 

Race

We will merge the smaller categories

 

In Displayr:

Select bottom four categories other than NET

Data Manipulation > Merge

Edit to: Other

 

And, these other labels are too long and will make our report messy

 

Rename

  • White
  • Latino
  • Black
  • Asian

 

Income

Let me first rename the variable.

 

In Displayr:

Click on Data Sets > In which…

GENERAL > Label <> Income

Page title: Income

 

We've got a lot of income categories. One option is to merge them, but a better option is to treat the data as being numeric.

 

In Displayr:

Click the Average button.

 

It's showing me an average income of 17.7. That doesn't make sense! We need to look at the data values to better understand

 

In Displayr:

Press DATA VALUES > Values

 

Ah, the way the data has been set up, an income of less than 1000 is a 1, 1000 to 2999 is 2, and so on. What we can do is replace these values with midpoints. For example:

1 -> 500

2  -> 2000

This is called midpoint recoding.We can do this automatically.

 

In Displayr:

Click percentages to convert back

TRANSFORMATIONS > Midpoint Coding and Quantification

Drag Income - RECODED onto page

Appearance > $

 

That makes more sense.

 

What, if anything … like

You will recall we showed a description of an iLock

 

In Displayr:

Click on Case Study

Go pack to What, if anything…

 

We asked them to say, in their own words, what they liked.

When we have text data, we need to categorize this into groups, so that we can then summarize the data like any other data. In survey research, this is often called coding.

 

In Displayr:

Insert > Text Analysis > Manual > Multiple overlapping .. > New

 

OK, so the first response is garbage. I will create a category to store poor quality data, as we will want to delete these respondents later.

 

In Displayr:

Rename category as Poor quality data

Select new category

Categorize as

 

We've got 109 people that said Nothing

 

In Displayr:

Right click on Poor Quality data

Add category

"Nothing"

 

The basic idea is that you read through the responses and categorize them by judgment.

Now, I won't bore you by making you watch. I did it earlier, and I'll load it now.

 

In Displayr:

Import > Resources - Documents\Data\Concept Test > iLock Likes.Qcodes

 

In our webinar on text analysis, which you can find on our website, you will find lots of ways to automatically analyze text data like this. Now, this causes new variables to be added to the data set. I'll give you a moment to read what people liked about the iLock Now, if you look at 3rd row from the bottom of the table, you can see that 5% provided poor quality data.

To get a better idea of what that means I'm going to filter the raw text, so it only shows the poor quality data.

 

In Displayr:

Inputs > FILTERS & WEIGHT > New

Data: What, if anything,

 

I've chosen the data we just created.

 

In Displayr:

Click on Poor quality data

Click Create Filter.

 

So, we asked them what they liked about the product. And, the 16 people basically told us junk.

If you think about how surveys work, all the previous questions just asked people to choose options. We have no way of checking if they chose sensibly. This is the first opportunity to see if the people are doing a good job answering, and these 16 people haven't. So, the right thing to do is to delete all their data. If they have given us garbage here, we can't rely on anything they've said.

 

In Displayr:

Click on Data Sets > iLock.sav

Unique Identifier: Response ID

Delete observations

 

This will delete the 16 rows of data. It's not permanent. We can undo it later if we need to.

Note over here on the right of the screen, it tells us that the data set contains 300 cases. But, look at the table. It's now based on 284 cases. All the other analyses in this document have been automatically updated to remove these 16 people with bad data.

 

What, if anything, do you … Dislike

Here we have asked about dislikes.

Title: Dislikes

As with the other text data, we need to code it.

 

In Displayr:

Click on What, if anything… dislike

Insert > Text Analysis > Manual > Multiple overlapping .. > New

 

As with before, I've already done it.

 

In Displayr:

Import - iLock.Dislikes > Save categories

Drag across data

 

This time it shows 0% with poor quality data. But, in surveys, you need to be a bit carefully when you see 0%, as there can still be people.

 

In Displayr:

Statistics > Cells > Count

 

Ok, so one person with poor quality data. Let's have a look at their data.

 

In Displayr:

Click raw data table

FILTERS & WEIGHTs > NEW

Data: Choose second What if anything

 

We'll need to give this a unique label

 

Label: Poor quality data - Dislikes

Created filter

 

So much for them all being English speaking! I will delete this person as well.

 

In Displayr:

Click on Data Sets > iLock.sav

Delete observations

Filter: … - Dislikes

 

Which phrase

This is usually called Purchase intention

I'll change the name of the page and the underlying data to match this

 

In Displayr:

Title: Purchase intention

Variable > Purchase intent

 

This data is clean and tidy. Nothing to do.

 

Compared with similar

This is often called uniqueness

 

In Displayr:

Title: Uniqueness

 

This data is clean and tidy. Nothing to do. We will return to this table later.

 

How well…

This is often called Brand fit.

There are too many categories. We should merge some of these categories.

 

How likely …

This is priced purchase intent. Nothing to do here.

 

Browser meta info - Browser

This tells us what type of browser they were using. Let's look at this as percentages:

 

In Displayr:

Data Sets > Browser Meta … > Data Manipulation > Percentages

 

Techniques for cleaning

We've just gone through the process of cleaning and tidying. This page summarizes what we just did.

 

Weighting

Next comes weighting. It's also known as sample balancing, raking, and post stratification. The basic idea here is that, in a survey you will often end up underrepresenting some groups in the population.

I've looked at the census data for gender and age, so will just look at them. When I compare this to the Census, I see we have a few too many females. We should have 51%, not 56%.

Too many 18 to 24s, and too few people aged 55 to 64. But, none of the differences are huge, so we don't need to weight it. If you do want to know how to weight, we've got both an eBook and a webinar on it.

 

Filtering

Filtering is the process of running analyses on only a subset of the data. You will remember we earlier filtered our text data to look at the low-quality responses. We will do more filtering later.

 

Overview - Planned analyses

This next topic is the thing that really separates out expert commercial survey researchers from the rest. Well before you look at your data, you need to very carefully identify the key things you need to work out. What novices do instead is they write a questionnaire but don't ever take the time to work through how they are going to analyze it, and this causes trouble when it comes time to do the analyses.

The specific plan that you will have depends entirely on what you are interested in. There's no standard plan.

 

Analysis plan

Here's my analysis plan for this survey. I will work through it. The first thing is, is the concept viable?

11% have said they would definitely buy it. People tend to exaggerate how likely they are to buy things, so you need to compare this data to benchmarks typically. The benchmark I'm using for this survey is 25%. So, we are a long way behind benchmark.

  1. What's next? We need to compare our purchase intention priced versus unpriced.

Let's look at the other bits of data that we planned to look at. But, before I do this, note that we've got 44% of people saying they would definitely not buy.

If we are going to find opportunities to improve the product to make it more appealing, we need to focus on these three middle categories. These people that aren't definitive one way or another.

Here's our table from before of Dislikes. We're going to filter it now, and just look at the data of people that said they Probably would buy or Might or Might not buy, or Probably wouldn't buy.

 

In Displayr:

New filter

Data  > Purchase intent

Choose middle three boxes

Create filter

 

What we want to see is one big category of dislikes, as then we know what we need to focus on. The two biggest dislikes are the brand apple and concerns about Security. But, they're both pretty niche.

Let's look at uniqueness. Most people are viewing it as somewhat different. So, the problem isn't that it's perceived as a "me too" product.

Only a few people are thinking it fits poorly with Apple. So that's not the problem.

 

Crosstabs

The most used tool in survey analysis is the crosstab. I'm going to explain it by reference to filtering.

Let's say we wanted to understand if purchase intent differs by gender. We can do a filter. Let's filter for men.

 

In Displayr:

Filter for men

 

Now, let's filter for women.

 

In Displayr:

Duplicate

Remove filter

Filter for women.

 

Looking at this we can see that the females are a bit more likely to say Definitely will buy, with a score of 13% versus 10%. Now, when you are wanting to look at surveys, you are always wanting to do analyses like these, so we need a faster way than filtering.

This is called a Crosstab. Each column has a separate filter, in this case based on the categories in Gender. You can see it says Column % in the table. This is to remind us that the filters are in the columns.

The next question we have to ask is this: how meaningful is the difference between the 10% purchase intention for men versus the 13% for women? Is the difference reliable? Or, is it just a fluke? Fortunately, this is a topic that the whole discipline of statistics has focused on solving.

The arrows are telling us whether the differences between the filter groups are reliable enough to tell other people about. There’re no arrows in the first row, so we can't conclude a difference between the I would definitely buy it scores of the men versus the women. Yes, we do get what's called a significant difference in the third row, but as that row's not very interesting, this significant difference is immaterial.

When you do surveys, you tend to have to do lots of analyses like these. So, we can automate the process further. I'm going to automatically create crosstabs comparing purchase intent by all the demographics.

 

In Displayr:

Insert > More > Tables > Lots of crosstabs

Rows: How likely

Columns: Gender … Income RECODED

 

Here's the difference by gender we saw before. That's more interesting. Purchase intention is strongly related to age. Note here that I'm looking at the first number in each cell, which is the Column %. 25% for the 18 to 24s. All the way down to 0% for the 5 or older.

We've got a significant difference in Alabama. But, there's only 5 people in Alabama in the study, so I'm going to ignore it

There's no difference by population density

In the all important I would definitely buy it, there's no difference by education

There is a higher purchase interest among Black people. Possibly also among Asians, but as we only have 15 of them in the sample, we need to be quite cautious. Even with the black group, 37's pretty small as sample sizes go.

There's no arrows, so no difference by income.

 

Stat testing / statistical significance

We've just done stat testing. Now, we move onto finding the story.

 

Finding the story

The pope of the day asked Michelangelo how he'd carved this most famous of all statues. He said. "It's simple. I just remove everything that's not David."

This is also the key principle of doing useful analysis and reporting. We just go and delete everything that's not interesting.

We want to structure the information so that the key bit is at the very beginning. Then, supporting information, and then, more detail. Like a pyramid.

And, gloss it up a bit

This is a mix of the key conclusion and some supporting material, so we need to pull it apart.

 

In Displayr:

Go to Case Study

Title: The iLock's a Loser

Draw box over concept

Line width: 0

Color: fa614b / 66

Box over top 11%

That's about 11%

Line width: 0

Color: fa614b

 

And, we need to make the age pattern we found before clearer.

 

In Displayr:

Chart > Stacked column chart

Appearance > Highlight > No

“The younger somebody is, the more interested they are in buying (the market will grow).”

 

If we were brave and strong, we would be like Michelangelo and delete everything else. It's just rubble. But, if a little less brave, we can go with the pyramid.


In Displayr:
Drag all the other outputs into it

Delete Recoded table

So, we're done. We can either create a dashboard. Or, export it to PowerPoint.


In Displayr:

Select report. Export > PowerPoint

Read more

Cookies help us provide, protect and improve our products and services. By using our website, you agree to our use of cookies (privacy policy).
close-image