Sentiment analysis is a way to quantify the feeling or tone of written text. In a survey context, this is a useful technique for gauging the overall attitude towards a brand, product, or feature. In sentiment analysis, each case receives a numeric sentiment score (on a negative to positive scale).

Nothing is ever as accurate as having a researcher manually coding text variables, one case at a time. But in the case of a large survey sample (or in the case of Big Data), the efficiency gained in using sentiment analysis can outweigh the loss of accuracy in finding the story in your data.

How does it work?

Displayr sends the text variable to an online English dictionary (using R) to score the words as positive, negative, or neutral. Positive words get a +1 scoring, while negative words get a -1 scoring. The final sentiment score is the sum of these scores. The process also attempts to identify when sentiment has been negated. For example, “not good” would generate a score of -1 instead of 1.

To illustrate, consider the following cases from a hypothetical text variable. The first case receives a sentiment score of +2, while the second case has a score of -2. The words contributing +/-1 towards the total score in each case are shown in brackets:

I really enjoyed (+1) the webinar – it was fun! (+1):  Score = +2

I didn’t like (-1) the webinar – because I hate (-1) the speaker: Score = -2

A sentiment score is generated for every respondent in the survey and saved as a numeric variable.

How do you run it in Displayr?

Displayr makes it convenient to compute a sentiment score variable. Simply select the text variable in the Data Tree tab and then select Insert > Text Analysis > Sentiment from the Ribbon.

The result is a new numeric variable in your Data tree available for analysis. You can use this variable in a variety of ways:

  • In cross-tabulations with other questions to see how the sentiment score may vary for different groups within the sample.
  • Looking at correlations of sentiment scores with other numeric variables (e.g., use Correlation Matrix).
  • You could also turn the numeric sentiment score variable into a categorical variable to divide your sample into those who are positive, neutral, and negative on the topic.

In some cases, you may like to “clean” your raw text variable before computing the sentiment scores. This is where the Text Analysis Setup feature can help (click here for more detail). In Displayr it is found under Insert > Text Analysis > Advanced > Setup Text Analysis. This creates an R output on the page where the raw text is processed for spell-checking, stemming, removal of words, replacement of specific words, and combination of words into phrases. To calculate the sentiment scores from the Text Analysis Setup, simply select the Text Analysis Setup on the page, and then Insert > Text Analysis > Sentiment from the Ribbon.

Try it yourself

In Displayr, you can try with this sample document here. Use the last variable in the data set (open-ended attitudes towards Microsoft).

You can also use sentiment analysis with social media data. This was the subject of this case study, which analyzed Trump’s Tweets during the 2016 election. For a demonstration of how sentiment analysis works on raw social media data, check out the Trump Tweet Case Study. For a demo of our sentiment analysis tool and other advanced text analysis tools you can view our  Text Analysis webinar.