Extracting Data from WordPress Using Displayr

Extracting Data from WordPress Using Displayr

Importing data from the WordPress API is simple in Displayr. You can connect to WordPress directly to bring in data about your blog posts. You can then use this to create visualizations or a dashboard to show the progress of your blog. In this post, I'll show you how to obtain a list of all your published posts and dates from your WordPress site using a little bit of R code in Displayr.

There are three main stages to getting data about posts into Displayr.

  1. Connect to the WordPress API.
  2. Count up how many posts are in the account.
  3. Scrape in the data from each of the posts and arrange it in a sensible format.

I break down the key steps below, but you can paste all the code as a single block to create an R Data Set. To do this:

  1. Click Home > New Data Set.
  2. Choose the R option.
  3. Paste in the code and modify it suitably.
  4. Enter a Name.
  5. Click OK.

You can also use the same code in an R Output (Insert > R Output) to explore and check it before adding the data set.

Connecting to the WordPress API

The httr R package is useful for requesting HTTP protocols and a library which is already installed in Displayr. The easiest way to extract this information is to use a GET request using the below code but substituting www.yoursite.com with the name of your website:

library(httr)
site = GET("http://www.yoursite.com/wp-json/wp/v2/posts")

You may also need to use “https” rather than “http“.

Finding the maximum number of posts

WordPress’s API has a few built-in defaults and restrictions to prevent large calls being made. The first is that only 100 posts at most can be displayed per page. The default is 10 posts. In order to display all our posts, we will need to first work out how many we have. By default, it is already showing only the published content. The API lets us call the total number of blog posts from the link we defined as site by referencing the header x-wp-total. As we are going to display 100 pages rather than the default number, we need to now divide the number by 100 and round it to get the correct number of pages:

x = headers(site)$`x-wp-total`
maxPages = ceiling(as.numeric(x)/100)

Doing the call using R

Next, we will loop the API call to iterate as many times as maxPages tells us:

dflist = list()

for (i in 1: maxPages) {
  post = GET(paste0("http://www.yoursite.com/wp-json/wp/v2/posts?page=", i, "&per_page=100"))
  data = content(post)
  Date = sapply(data, function(x) return(x$date))
  Title = sapply(data, function(x) return(x$title$rendered))
  df = data.frame(Date, Title)
  dflist[[i]] = df
}
posts = do.call(rbind, dflist)

In this code:

  • An easy way to store the looped data is as a list and then combine it together afterward. Here, we have defined an empty list as dflist.
  • We set the loop to iterate from 1 to maxPages.
  • We then concatenate the site name with the page number (i.e. loop iteration) and set per_page as 100.
  • Next, call the content of the GET request using the content command and define it as data.
  • The next section we specify which fields we want to extract using the sapply method. The name on the left is the name you will give the field in your data set (i.e. Date) and the name within the return function reflects the WordPress API name (i.e. $date) – field names can be found using content and referencing the GET request.
  • We then combine the extracted fields into a data frame called df.
  • This data frame then gets set as a list based on the current iteration.
  • Finally, we use the do.call command to combine all the rows of the list using rbind.

You now have an imported data set with the date and blog title. Other fields can be added to this. For more information, see WordPress API – Post References.

Visualizing count data

Finally, we can visualize the count of this year’s posts by quarter using Displayr’s pictograph chart function. To do so:

  1. Select the Date variable under Data Sets in the bottom left.
  2. In the Object Inspector on the right, change the Properties > INPUTS > Structure setting to  Date/Time. This will create a new copy of the variable, formatted as a date.
  3. Click the Date/Time button under Properties > INPUTS.
  4. Change Aggregation to 1 Quarter. You could also choose to aggregate by month, week, and so on.
  5. Select Insert > Visualization > Pictograph > Bar Chart.
  6. In the Object Inspector on the right, select the new ‘date’ variable in Variables in ‘Data’ under Inputs > DATA SOURCE, then click Automatic to create an easy stickman pictograph!

 

As in the above example, you can additionally adjust the icon groupings on the Chart tab by setting the Units per icon (scale) to 5 under APPEARANCE, change Color palette to Strong colors under DATA SERIES and set a larger font size for DATA LABELS and CATEGORIES (Y) AXIS.

I hope you found this post helpful! Find out more ways you can use Displayr. 

About Oliver Harrison

After completing a PhD in German history and literature, Oliver swapped old dusty books for computer screens and logic. He then enjoyed the next 10 years as a survey programmer and data analyst in the Australasian market research industry. Today Oliver is passionate about problem-solving and helping customers achieve their goals as a member of the Customer Success team at Displayr.