A long time ago, in a galaxy far, far, away, Tom Cruise jumped over a couch and landed on everybody else’s nose. Most people seem to have forgiven him, but at the time, he was unpopular. He’d just broken up with Nicole Kidman. This post is going to show you why Displayr is the best tool for word clouds using data from that time and galaxy. That data contains 300 responses of people telling us why they dislike Tom Cruise. The raw data is below.
Creating the word cloud
Word clouds are created in Displayr by:
- Importing a data set into Displayr (Home > Data Set (Data)).
- Dragging a text variable from the data tree.
- Selecting Home > Chart (Charts) > Word cloud.
If you want to try this for yourself using this data, click here.
Automatic removal of uninteresting words
Displayr automatically performs some basic text analytics when you create a word cloud, by choosing to ignore words that it regards as uninteresting. You can drag additional words into the Ignore bucket, or, drag words back.
Drag and drop to merge words
You can also click on words and drag them onto other words. For example, dragging arrogent onto arrogance. Displayr shows us that the new word is arrogance, which is used by 5 people. We can edit this label if we wish.
Viewing the original text
By clicking on the magnifying glass next to the words (e.g., the the right of arrogent, in the image above), we can see the original text. This allows us to confirm if it is appropriate to merge together the words.
Often when we are trying to create a word cloud we need to add a phrase. For example, on the word cloud you can see that Tom and Cruise are appearing as separate words. If you click on Tom, you will see that 23 of the appearances are as Tom Cruise. The way that we get Displayr to include a phrase is to click on the word we want to change (e.g., Tom) and then edit the name, in the field on the top-left, remembering to press Enter on your keyboard. Below, you can see what happens , we have automatically created a phrases called Tom Cruise, and it contains the 23 Tom Cruise responses and the one Tom response. If we want to split off the solo Tom as its own word on the word cloud, we click the cross next to it.
You can try this for yourself in Displayr.
In my next post, I discuss how you can automate the whole tidying up process of adding phrases and the like using text analytics.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.
Also published on Medium.