Once you have your data in Displayr, and have created some tables, the next step is to perform calculations using these tables as inputs. The results will be displayed in R Outputs.
Introducing R Outputs
An R Output is a type of calculation in Displayr. It has three defining properties:
- It is an output that appears on a page. It can be anything from a textbox or simple calculation through to data science algorithms or visualizations. Five simple examples are shown in the screenshot from Displayr below.
- It uses the R programming language to create the output. You can view the programming for these examples by opening the page below in Displayr and clicking on each R Output. The code will appear in the Properties tab of the object inspector on the right-hand side of the screen, under the heading R CODE.
- It has a Name. For example, the table at the top-left of the page is named mytable. To see the name, click GENERAL in the Properties window.
You can create an R Output by clicking Insert > Analysis > R Output, typing some code, and pressing Calculate . When this button is red it must be pushed to update the calculation.
Performing calculations using the cells of a table
You will see 233.0 shown in green on the page above. If you click on it in Displayr, you will see that it contains the following code:
output1 = mytable[3, 2] + mytable[3, 3]
In programming logic, the Name of the R Output is shown on the left: output1. This output is the sum of mytable[3, 2] and mytable[3, 3].
The Name of the table at the top-left of the screen is mytable. In order to refer to something in a calculation you need to know its name. You can work out the name of any table, R Output, or data by clicking on it and then checking under Properties > GENERAL > Name.
The cell in the 3rd row and 2nd column of the table contains the number 64. Thus, mytable[3, 2] is 64. Similarly, mytable[3, 3] is 169. mytable[3, 2] + mytable[3, 3] makes 233, as shown.
Calculations using columns and rows of tables
The R Output in the bottom-left contains 5 numbers. If you click on it in Displayr you will see that it contains the following code:
output2 = mytable[, 2] + mytable[, 3]
This means that the R Output is called output2. It is the sum of the 2nd and 3rd columns of the table. As the table contains 5 rows, the output contains five values, one for each row. Calculations like this, which involve whole rows or columns of data, are known as vector arithmetic.
We could also have achieved the same result using the following code:
output2 = rowSums(mytable[, 2:3])
There are two aspects to this shorter code:
- mytable[, 2:3] represents all of the data from columns 2 through to column 3 of the table. If we had typed mytable[, 1:3] we would get all the columns in the table.
- rowSums computes the sum for each row.
We can also do this same type of calculations on rows. For example, mytable[2, ] + mytable[3, ]means that we want to sum rows 2 and 3. It will return 3 elements.
R Outputs with multiple lines of code
Each of our examples so far has illustrated very simple calculations. Often, there is a need for more complex calculations. This is done by writing multiple lines of code. The following lines of code repeat our earlier example of adding columns 2 and 3:
secondColumn = mytable[, 2] thirdColumn = mytable[, 3] output2 = secondColumn + thirdColumn
There are a few key things to appreciate in this example:
- In the R CODE, we are creating three objects: secondColumn, thirdColumn, and output2.
- The last object that is created becomes the R Output. The other objects, secondColumn and thirdColumn, are created as intermediate steps in the creation of output2.
- Only output2 remains after the computation has been completed, and only output2 can be referred to by other R Outputs.
The next blog post in this series, Introduction to Displayr 5: Machine learning and multivariate statistics, discusses more advanced data science features.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.