Tutorial: Smartphone Marketing – What’s in a Name?
Want to run your own text analysis in Displayr? This is how we analyzed the web pages of different smartphone models to find out what the data reveals about their marketing.
Make sure you read our blog post What’s in a Name? A Data Science Analysis of Smartphone Marketing before reading on!
Creating the first table
Add your data as a dataset in Displayr. In the Data Sets section on the left, click Insert a Data Set. You can then drag and drop your data, select a file from your computer, or use another source such as a website. My dataset was in an Excel spreadsheet, so I imported that file and Displayr split it into variables for me. From here, there are two ways to create a table. The easiest way is to drag and drop the data onto the document. You can also click Insert > Paste Table or Enter Table to enter the data manually or paste it from a spreadsheet.
Once we've inserted the data into Displayr, we want to analyze it. In this instance, I wanted to know the most common words which were used on each of the three web pages. To run text analysis, go to Insert > More > Text Analysis > Setup Text Analysis and drag and drop the dataset you want to analyze. Press Calculate.
As you can see, we've made a table that shows the words most frequently used on the page for the iPhone X. But it's very short. This is because it only displays words which are mentioned at least 5 times. Since our sample size is quite small, I want to display words which are used at least 3 times. To do this, change the Minimum Frequency under Inputs to 3 (or whatever you want).
Now we have 24 entries instead of 8. We now need to remove a few words that are irrelevant for our analysis. Displayr automatically discounts words like "the", "and", "is", and "to." In this table, we can see that "000" was used 3 times. This is in the context of large numbers, such as 1,000,000. This is obviously not useful, so we want to remove it from the table. To do this, under the Inputs section, we can type words or phrases we want to remove in the Remove words/phrases bar, separated by commas. We're also going to remove "cent", which is used in the context of "per cent". This leaves us with the 22 words which are used 3 or more times, organized by frequency. Repeat this process for each of the datasets.
Creating the word clouds
To create a word cloud, go to Insert > Charts > Word Cloud. Drag the dataset you want to visualize from the Data Sets menu, or paste or type your data in the spreadsheet under Inputs. Click Calculate to show your word cloud.
Now to clean it up! The first thing to do is to remove any irrelevant words, the same way we did for the word frequency table. You can do this by dragging the word you want to delete to the Ignore tab on the right side of the word cloud. I'm going to delete "000", "It's", "Just", "per", "cent", "that's", and "instead" from our word cloud.
Now, I want to combine some of the words - for example, we can see that "iPhone" and "X" appear as separate words in the cloud. When we click on "X" in the cloud, we can see that it only ever appears following "iPhone". To combine them, drag "X" onto "iPhone". This is essentially telling Displayr that "iPhone" and "iPhone X" are synonyms.
We can also create phrases in our word cloud. For example, "Face" and "ID" appear as separate words, but the only context in which the word "ID" appears is in the phrase "Face ID". To add this as a phrase, click on "ID" and type in "Face ID" as the new phrase. This will change the word cloud.
That's how we created our word clouds!