R
| 24 August 2017 | by Jake Hoare

Analyzing Google Trends Data in R

Google Trends shows the changes in the popularity of search terms over a given time (i.e., number of hits over time). It can be used to find search terms with growing or decreasing popularity or to review periodic variations from the past such as seasonality. Google Trends search data can be added to other analyses, manipulated and explored in more detail in R.

This post describes how you can use R to download data from Google Trends, and then include it in a chart or other analysis. We’ll discuss first how you can get overall (global) data on a search term (query), how to plot it as a simple line chart, and then how to can break the data down by geographical region. The first example I will look at is the rise and fall of the Blu-ray.


Analyzing Google Trends in R

I have never bought a Blu-ray disc and probably never will. In my world, technology moved from DVDs to streaming without the need for a high definition physical medium. I still see them in some shops, but it feels as though they are declining. Using Google Trends we can find out when interest in Blu-rays peaked.

The following R code retrieves the global search history since 2004 for Blu-ray.

library(gtrendsR)
library(reshape2)
 
google.trends = gtrends(c("blu-ray"), gprop = "web", time = "all")[[1]]
google.trends = dcast(google.trends, date ~ keyword + geo, value.var = "hits")
rownames(google.trends) = google.trends$date
google.trends$date = NULL

The first argument to the gtrends function is a list of up to 5 search terms. In this case, we have just one item. The second argument gprop is the medium searched on and can be any of web, newsimages or youtube. The third argument time can be any of now 1-d, now 7-d, today 1-m, today 3-m, today 12-m, today+5-y or all (which means since 2004). A final possibility for time is to specify a custom date range e.g. 2010-12-31 2011-06-30.

Note that I am using gtrendsR version 1.9.9.0 which is loaded by the following code. This version improves upon the CRAN version 1.3.5 (as of September 2017) by not requiring a login.

library(devtools)
install_github("PMassicotte/gtrendsR")

You may see a warning if your timezone is not set – this can be avoided by adding the following line of code:

Sys.setenv(TZ = "UTC")

After retrieving the data from Google Trends, I format it into a table with dates for the row names and search terms along the columns. The table below shows the result of running this code.


Plotting Google Trends data: Identifying seasonality and trends

Plotting the Google Trends data as an R chart we can draw two conclusions. First, interest peaked around the end of 2008. Second, there is a strong seasonal effect, with significant spikes around Christmas every year.


Note that results are relative to the total number of searches at each time point, with the maximum being 100. We cannot infer anything about the volume of Google searches. But we can say that as a proportion of all searches Blu-ray was about half as frequent in June 2008 compared to December 2008. An explanation about Google Trend methodology is here.


Google Trends by geographic region

Next, I will illustrate the use of country codes. To do so I will find the search history for skiing in Canada and New Zealand. I use the same code as previously, except modifying the gtrends line as below.

google.trends = gtrends(c("skiing"), geo = c("CA", "NZ"), gprop = "web", time = "2010-06-30 2017-06-30")[[1]]

The new argument to gtrends is geo, which allows the users to specify geographic codes to narrow the search region. The awkward part about geographical codes is that they are not always obvious. Country codes consist of two letters, for example, CA and NZ in this case. We could also use region codes such as US-CA for California. I find the easiest way to get these codes is to use this Wikipedia page.

An alternative way to find all the region-level codes for a given country is to use the following snippet of R code. In this case, it retrieves all the regions of Italy (IT).

library(gtrendsR)
geo.codes = sort(unique(countries[substr(countries$sub_code, 1, 2) == "IT", ]$sub_code))

Plotting the ski data below, we note the contrast between northern and southern hemisphere winters. It is also relatively more popular in Canada than New Zealand. The 2014 winter Olympics causes a notable spike in both countries but particularly Canada.



Create your own analysis

In this post I have shown how to import data from Google Trends using the R package gtrendsR. Anyone can click on this link to explore the examples used in this post or create your own analysis  (just sign into Displayr first).

Author: Jake Hoare

After escaping from physics to a career in banking, then escaping from banking, I decided to go back to BASIC and study computing. This led me to rediscover artificial intelligence and data science. I now get to indulge myself at Displayr working in the Data Science team, often on machine learning.

Analyze Google Trends using R

Analyze Google Trends using R code in Displayr

SIGN IN



5 Comments. Share your thoughts.

  1. Teemu

    Hi Jake,

    thanks for the interesting article. Question about gtrendsR version 1.9.9.0 – where can you find this as I still have the version 1.3.5?

    thanks!


    • Gaurav Jain

      Thanks for reading my article. v 1.9.9.0 is available at https://github.com/PMassicotte/gtrendsR, so instead of CRAN use the R commands library(devtools) then install_github(“PMassicotte/gtrendsR”). It might be a little unstable because it is not formally released, but I have not found any major problems. The big advantage over 1.3.5 is removing the password-based authentication.


      • Teemu Niskanen

        Thanks, that worked. For some reason I thought 1.9.9.0 would be Release candidate and I tried:

        install.packages(“drat”) # easier repo access + creation
        drat:::add(“ghrr”) # make it known
        install.packages(“gtrendsR”) # install it

        but as you helpfully pointed out, should use:

        if (!require(“devtools”)) install.packages(“devtools”)
        devtools::install_github(“PMassicotte/gtrendsR”)


  2. mimo

    Hi,
    there is this error …
    Error: Not enough search volume. Please change your search terms.

    Is it normal ??
    I did this code 2 Pc & the same error


    • Gaurav Jain

      Hi, Are you using version 1.9.9.0 of gtrendsR? I have heard that people can have that error when using an earlier version, such as the CRAN version (1.3.5). You can download 1.9.9.0 with devtools::install_github(“PMassicotte/gtrendsR”)


Leave a Reply

Your email address will not be published. Required fields are marked *

Human? *

Keep updated with the latest in data science.