This blog post will take you through how to use basic R code to make more complicated filters.
At first glance some of the R code may look scary to those of you who have never used it before but it offers a lot of flexibility. We have already covered how to create and connect filters to a control box. The following deals with other common scenarios that can be easily updated to fit your requirements.
The process to create an R filter is to go to Insert > R > Numeric Variable, add the condition in the R CODE window and then select Usable as a filter.
The R Basics
Before we begin, let's cover some variable and question referencing and condition operators.
With R you can reference either the variable name or the question name/label. If the question name has spaces in it, however, you must wrap it in back ticks. As an example, I have a single variable called Q1 with the label of "Q1. How old are you?". To reference the variable, I would use Q1 but to reference the question it would be `Q1. How old are you?`. In this case, using either option produces the same result but the distinction can be important as a question could be made up of multiple variables.
The dimensions of a question are question[rows, columns, drop=False] whereby you may choose to specify or exclude specific columns from being included in the columns argument. In order to keep the original dimensions, we add drop=False or drop=F as an additional argument.
If you are referencing a single column, you can do so using the column name, e.g. "Coca-Cola". You can also use column index but if you change the order of items in the corresponding table of this question, this number may change. If you are referencing multiple columns then wrap them in c() such as in c("Coca-Cola","Pepsi").
The main condition operators are as follows:
Filters based on single-response questions
When dealing with single variable questions, whether numeric or categorical, it is easy to set up basic binary 'Yes' or 'No' conditions.
1. 'Yes' Filter
Q1 is 1:
Q1 is less than 5 OR Q2 is 2:
Q1<5 | Q2==2
Q1 is greater than or equal to 25 AND less than or equal to 100:
Q1>=25 & Q1<=100
2. 'No' filter
And here are the reverse of the above filter conditions using the ! operator.
Q1 is NOT 1:
Q1 is NOT less than 5 AND Q2 is NOT 2:
!(Q1<5 | Q2==2)
Q1 is NOT greater than or equal to 25 AND NOT less than or equal to 100:
!(Q1>=25 & Q1<=100)
Filters based on multiple-response questions
Using R to create filters based on Binary – Multi questions has the advantage of being able to reference a 'question' rather than simply individual variables. This is essentially the same as viewing your data as a table with each variable as a column and the rows as the respondents. When you have a long list of statements, you can therefore get away with specifying the least amount of columns for your condition to work.
The below uses the example of a binary question called 'Awareness - Colas' where each cola is a different variable.
1. 'Only' filter
Only aware of Coca-Cola:
q = `Awareness - Colas` rowSums(q[,"Coca-Cola",drop=F])>0 & rowSums(q[,!colnames(q) %in% c("Coca-Cola","NET"),drop=F])==0
Here, we are counting anyone who selected Coca-Cola and then summing per row any column that is not named 'Coca-Cola' or 'NET' and returning any that were not selected. In this case, it is easier to define the question as 'q' first when using it multiple times in the code.
Alternatively, you could replace rowSums(q[,"Coca-Cola",drop=F])>0 with the name of the variable, e.g. q2_1.
2. 'Count' filter
Aware of more than 1 cola:
q = `Awareness - Colas` rowSums(q[,!colnames(q) %in% c("NET"),drop=F])>1
Here, we are summing all the columns per row (excluding NET) and returning any that selected more than one. You can in turn change the value at the end to match the total number of items for creating an 'All' filter.
3. 'Any' filter
Aware of any colas:
rowSums(`Awareness - Colas`)>0
Here, we are summing all the columns per row and returning any that were selected.
4. 'None' filter
Aware of no colas:
rowSums(`Awareness - Colas`)==0
Here, we are summing all the columns per row and returning any that were not selected.
Filters based on dates
Date formats can vary but half the battle is won if you are using a standard date format. The below examples assume you have a Date/Time variable in your data set. If not, you will need to convert the date first. This may be as simple as changing the Structure to Date/Time, provided it is already in a recognizable format. Otherwise, you can convert strings to dates and adjust timezone, if necessary.
The below uses a date variable called 'StartDate'.
1. Before/After Date
'StartDate' is before October 2020:
StartDate < "2020/10/01"
Here, the date format must match that of 'StartDate', i.e. YYYY/MM/DD. If you are using a date control (via Insert > Control > Date) that selects the date, you can replace "2020/10/01" with the name of your date control.
2. Date Range
'StartDate' is between October and December 2020:
StartDate >= "2020/10/01" & StartDate <= "2020/12/31"
Again, the date format must match that of 'StartDate', i.e. YYYY/MM/DD.
3. Last/Next K period
'StartDate' is within the last 3 months:
library(lubridate) StartDate >= Sys.Date() - months(3)
Here, the 'months' argument is from the lubridate package so we need to reference this library. This can be changed to days, weeks or years, and you can change the – to a + for future periods. 'Sys.Date()' is simply the system date or the current date on the server.
4. Previous/Current/Next period
'StartDate' falls within the current quarter:
library(flipTime) Period(StartDate, "quarter") == Period(Sys.Date(), "quarter")
Here, the 'Period' argument is from our flipTime package so we need to reference this library. The quarter as defined by 'StartDate' is then compared to the current quarter based on today's date. The 'quarter' argument can be changed to day, week, month or year. You can also add the period arguments from the previous example to adjust when the period falls. For example, comparing 'StartDate' to next quarter is as simple as adding 3 months to today's date:
library(flipTime) library(lubridate) Period(StartDate, "quarter") == Period(Sys.Date() + months(3), "quarter")