How to Show Sentiment in Word Clouds using Displayr

by Tim Bock

The Word Cloud above summarizes some data from tweets by President Trump. The green words are words that are significantly more likely to be used in tweets with a positive sentiment. The red represents words more likely to be used in negative tweets. This post describes the basic process for creating such a Word Cloud in Displayr. Please read How to Show Sentiment in Word Clouds for a more general discussion of the logic behind the code below.

Create your own Word Cloud

Step 1: Importing the data

This post assumes that you have already imported a data file and this data file contains a variable that contains the phrases that you wish to use to create the Word Cloud. If you have the data in some other format, instead use Insert > R Output and use the code and instructions described in How to Show Sentiment in Word Clouds using R.

If you want to reproduce the Word Cloud form above, you can do so by pressing Insert > Data Set (data), clicking on R, and

Set the Name to trumpTweats
Enter the code below.
Press OK.

 
load(url("http://varianceexplained.org/files/trump_tweets_df.rda"))
trump_tweets_df$text <- gsub("http.*", "", trump_tweets_df$text)
trump_tweets_df

Step 2: Extracting the words

Insert > Text Analysis > Advanced > Setup Text Analysis
Select the Text Variable as text (this is the name of the variable containing the tweets)
Check the Automatic option at the top.

Step 3: Sentiment for the phrases (tweets)

On the Data Sets pane, select the first variable (it is called text)
Insert > Text Analysis > Sentiment

Create your own Word Cloud

Step 4: Sentiment for each word

Insert > R Output
Paste in the code below

As discussed in How to Show Sentiment in Word Clouds , your Word Cloud may look a bit different and you do need to perform a check to make sure no long words are missing. Also, if you have tried these steps a few times in the same project, you will need to update the variable, R Output, and question names to make everything work.

 
# Sentiment analysis of the phrases 
phrase.sentiment = `Sentiment scores from text.analysis.setup`
phrase.sentiment[phrase.sentiment >= 1] = 1
phrase.sentiment[phrase.sentiment <= -1] = -1

# Sentiment analysis of the words
final.tokens = text.analysis.setup$final.tokens
td = t(vapply(text.analysis.setup$transformed.tokenized, function(x) {
    as.integer(final.tokens %in% x)
}, integer(length(final.tokens))))
counts = text.analysis.setup$final.counts 
phrase.word.sentiment = sweep(td, 1, phrase.sentiment, "*")
phrase.word.sentiment[td == 0] = NA # Setting missing values to Missing
word.mean = apply(phrase.word.sentiment,2, FUN = mean, na.rm = TRUE)
word.sd = apply(phrase.word.sentiment,2, FUN = sd, na.rm = TRUE)
word.n = apply(!is.na(phrase.word.sentiment),2, FUN = sum, na.rm = TRUE)
word.se = word.sd / sqrt(word.n)
word.z = word.mean / word.se
word.z[word.n <= 3 || is.na(word.se)] = 0        
words = text.analysis.setup$final.tokens
x = data.frame(word = words, 
      freq = counts, 
      "Sentiment" = word.mean,
      "Z-Score" = word.z,
      Length = nchar(words))
word.data = x[order(counts, decreasing = TRUE), ]

# Working out the colors
n = nrow(word.data)
colors = rep("grey", n)
colors[word.data$Z.Score < -1.96] = "Red" colors[word.data$Z.Score > 1.96] =  "Green"

# Creating the word cloud
library(wordcloud2)
wordcloud2(data = word.data[, -3], color = colors, size = 0.4)

CAPABILITIES

OBJECTIVES

TECHNIQUES

TECHNIQUES

LEARN

SUPPORT

ON-DEMAND WEBINAR

How to Show Sentiment in Word Clouds using Displayr

Step 1: Importing the data

Step 2: Extracting the words

Step 3: Sentiment for the phrases (tweets)

Step 4: Sentiment for each word

Prepare to watch, play, learn, make, and discover!

Get access to all the premium content on Displayr

Last question, we promise!

What type of survey data are you working with? (select all that apply)