How to Show Sentiment in Word Clouds using Q
The Word Cloud above summarizes some data from tweets by President Trump. The green words are words that are significantly more likely to be used in tweets with a positive sentiment. The red represents words more likely to be used in negative tweets. This post describes the basic process for creating such a Word Cloud in Q. Please read How to Show Sentiment in Word Clouds for a more general discussion of the logic behind the code below.
Step 1: Importing the data
This post assumes that you have already imported a data file and this data file contains a variable that contains the phrases that you wish to use to create the Word Cloud. If you have the data in some other format, instead use Create > R Output and use the code and instructions described in How to Show Sentiment in Word Clouds using R.
If you want to reproduce the Word Cloud form above, you can do so using File > Data Sets > Add to Project > From R, and:
- Set the Name to trumpTweats
- Enter the code below.
- Press the play button (the blue triangle).
- Press Add data set and OK.
load(url("http://varianceexplained.org/files/trump_tweets_df.rda")) trump_tweets_df$text = gsub("http.*", "", trump_tweets_df$text) trump_tweets_df
Step 2: Extracting the words
- Create > Text Analysis > Setup Text Analysis
- Select the Text Variable as text (this is the name of the variable containing the tweets)
- Check the Automatic option at the top
Step 3: Sentiment for the phrases (tweets)
- Go to the Variables and Questions tab
- Select the first variable (it is called text)
- Create > Text Analysis > Techniques > Save Sentiment Scores
Step 4: Sentiment for each word
- Create > R Output
- Paste in the code below
- Press Calculate and you will have the Word Cloud!
As discussed in How to Show Sentiment in Word Clouds , your Word Cloud may look a bit different and you do need to perform a check to make sure no long words are missing. Also, if you have tried these steps a few times in the same project, you will need to update the variable, R Output, and question names to make everything work.
# Sentiment analysis of the phrases phrase.sentiment = `Sentiment scores from text.analysis.setup` phrase.sentiment[phrase.sentiment >= 1] = 1 phrase.sentiment[phrase.sentiment <= -1] = -1 # Sentiment analysis of the words td = as.matrix(AsTermMatrix(text.analysis.setup, min.frequency = 1.0, sparse = TRUE)) counts = text.analysis.setup$final.counts phrase.word.sentiment = sweep(td, 1, phrase.sentiment, "*") phrase.word.sentiment[td == 0] = NA # Setting missing values to Missing word.mean = apply(phrase.word.sentiment,2, FUN = mean, na.rm = TRUE) word.sd = apply(phrase.word.sentiment,2, FUN = sd, na.rm = TRUE) word.n = apply(!is.na(phrase.word.sentiment),2, FUN = sum, na.rm = TRUE) word.se = word.sd / sqrt(word.n) word.z = word.mean / word.se word.z[word.n <= 3 || is.na(word.se)] = 0 words = text.analysis.setup$final.tokens x = data.frame(word = words, freq = counts, "Sentiment" = word.mean, "Z-Score" = word.z, Length = nchar(words)) word.data = x[order(counts, decreasing = TRUE), ] # Working out the colors n = nrow(word.data) colors = rep("grey", n) colors[word.data$Z.Score < -1.96] = "Red" colors[word.data$Z.Score > 1.96] = "Green" # Creating the word cloud library(wordcloud2) wordcloud2(data = word.data[, -3], color = colors, size = 0.4)
About Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.