How to Analyze Free-Form Text Data from Surveys
You want to give your survey respondents the opportunity to answer open-ended questions or elaborate on their responses. But how do you analyze the free-form text data from your survey? I'll show you three different methods and explain when you might want to use each.
What is Free-Form Text Data from Surveys?
Customer feedback surveys often allow respondents to answer questions in their own words. For example, a question may ask “What is the first brand that comes to mind when you think of insurance?” or “What don’t you like about Tom Cruise?”. The data generated from these questions is known variously as text data, free-form text data, verbatims, and open-ended data. There are three main ways of analyzing such data: coding, text analytics, and word clouds.
How do you analyze free-form text data?
The traditional approach to analyzing text data is to code the data. Coding works as follows:
- One or two people read through some of the data (e.g., 200 randomly selected responses), and use their judgment to identify some main categories. For example, for the question asked about attitudes to Tom Cruise, the categories may be: 1. Like him; 2. Hate him; 3. Don’t know who he is; and 4. Other. The list of categories and their associated codes is known as a code frame.
- Then someone reads all the data text and manually assigns a value or values to each response. The assigned value reflects the code created in the previous stage. If the person said, “I really love Tom!”, the code assigned would be 1. Depending on the data, each response will be assigned either one value (single response), or multiple values (multiple response). In the case of the question “What don’t you like about Tom Cruise?” it would be appropriate to permit multiple responses.
- Variables created in the previous step are then analyzed (e.g., using frequency tables or crosstabs).
2. Text analytics
Text analytics involves using algorithms to automatically convert text to numbers to perform a quantitative analysis. For example, sentiment analysis automatically calculates the sentiment of phrases based on the number of positive and negative words that appear.
All else being equal, text analytics is less informative than coding, as humans are better at correctly interpreting meaning in text than algorithms. For example, it is hard to train a computer to correctly analyze “I love Coke. Not!” or “Coke is wicked.”
However, coding is very expensive, so text analytics is the usual method for larger quantities of text.
3. Word clouds
A word cloud is a visualization that shows all the words in text, packaged closely together, with the font size indicating the frequency with which words appear, with less interesting words (e.g., “the”) automatically excluded. A word cloud of answers to “What don’t you like about Tom Cruise?” is shown below. This is the most simplistic approach to analyzing text data but also the cheapest and fastest.
About Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.