Text data often refers to entities, such as people, organizations, or places. These entities can be automatically extracted from text data, and then used in further analyses. As an example, in this post, I reanalyze a famous set of tweets by a candidate for the US Presidency in 2016 and see how the sentiment relates to who has been named in the posts.

Extracting the entities

In Displayr, we extract the entities by clicking Insert > Text Analysis > Automatic Categorization > Entity Extraction and then selecting the text variable of interest. After a bit of a wait, you get the output below. You can expand out these groups to see what's been found.

Saving the entities as variables

The next step is to save the entities as variables in the data file. This is down by selecting the output, and then clicking Insert > Text Analysis > Advanced > Save Variables > Categories.

Comparing sentiment by entities

OK, so within the Person entity, we've worked out who is mentioned in the tweets. What next?

We can crosstab this by other information. In the example below, I've computed the sentiment of the tweets (Insert > Text Analysis > Sentiment) and crossed them by the items within the Person entity. The averages show the sentiment scores assigned to each tweet that mentioned these names. Scores below 0 indicate negative sentiment. Scores in red indicate statistically significant low sentiment. You can probably work out who sent the tweets!