Extracting Results from Tables as R Outputs
This post describes how to extract results from tables in Displayr. The results are extracted as R Outputs. These R Outputs can then either be formatted or used as inputs to visualizations when creating dashboards and reports.
Tables and all other objects in Displayr have names. In order to extract data from a table we need to know two things: its name and how many dimensions it has. There are two ways you can see the name of a table. You can click on the Expander to the left of the page name in the pages tree, which will show the names of all of the outputs on the page.
Alternatively, you can click on the table and view its name in the Object Inspector (Properties > GENERAL > Name).
By default, the name is composed of three elements: the word table, the name of the variable sets used to construct the table (e.g., Age), and a number if there are multiple tables created from the same variable set. You can change the name; it will automatically reset if the input data changes.
Extracting data from a one dimensional table
The age table shown above above is one-dimensional, which is to say it has a single column of numbers. To extract numbers from the table we need to create an R Output, which is an object that appears on the page and will store these numbers. This is done by clicking Insert > R Output and then either typing in the name of the table or dragging it from the pages tree, and then subscripting the table. As an example, the code below extracts the value from the table above of people aged 25 to 29.
As 25 to 29 is the second category in the table, we can get the same result by typing table.Age.
With a table with a single number in it, we do not need to use subscripts at all and can enter the name of the whole table. Or, we can reference the name of the row, as shown below.
There are some rare situations where a table like the one shown above will in fact have two dimensions. If this is the case you will see a message saying Incorrect number of dimensions and will need to treat it as a two-dimensional table (discussed below).
Extracting multiple values
We can extract multiple values by listing which cells in the table to extract, as shown below. Displayr also shows the corresponding row labels of the table. These will generally be ignored in any ensuing calculations.
We can do the same thing with numbers instead of labels:
Consecutive cells can be selected more efficiently using:
We can also indicate which cells not to use via negative signs, as done here:
Excluding cells is most commonly useful when we are excluding nets or totals, which tend to appear in the last cell in the table. This example excludes the last row, where NROW is a function that works out the last row of a table (i.e., NROW(table.Age) is 10):
Extracting results from two-dimensional tables
The principles of extracting results from two-dimensional tables are the same as those for one-dimensional tables, except that we need to specify the rows and the columns. If you see an error of Incorrect number of dimensions, this means that your table is not really two-dimensional, and you need to see the section below on higher dimensional tables.
In this section we will extract data from the table shown below, called table.Age.by.Gender.
To extract a single cell from the table, we enter both its row and column name, separated by a comma.
To extract a whole column, we just leave the row section of the subscripting blank (i.e., the bit to the left of the comma).
This example extracts rows two, three, and four, for the "Male" column.
This example extracts all but the last row and the last column.
We can write this more efficiently by assigning the table a shorter name in the R code:
In the example below, we still have a crosstab of age by gender, but the count has also been added to the table (Object Inspector > STATISTICS > Cells: Count). This table now has three dimensions: age, gender, and the statistics. We will get an error of Incorrect number of dimensions if we attempt to subscript it and ignore the third dimension.
If we add a comma, we get both statistics.
We can also choose what we want from the third dimension, using the same principles as seen in the earlier examples.
Higher-dimensional tables (arrays)
The table below has five dimensions. Such data is usually referred to an array in R.
We subscript it using the same principles as with the other examples.
With higher dimensional tables it can be a bit difficult to work out which labels need to go in which position in the square brackets. The best way to find this out is to use the dimnames function. It shows in double square brackets the different dimensions and their categories, as shown in the two examples below.
Safety first: defensive programming
A lot of people new to writing code decide to take the fast option and use code like table.Age[2:3, -11], rather than table.Age[c("25 to 29", "29 to 34", "35 to 39"), -NCOL(table.Age)], because it is much faster to type. It is possible that the input table may change in some way and the code ends up giving the wrong answer. This would occur, for example, if the 18 to 24 category was removed from the table, or if the age data was replaced with the income data in the table. By contrast, the more verbose table.Age[c("25 to 29", "29 to 34", "35 to 39"), -NCOL(table.Age)] is a good example of defensive programming. If the 18 to 24 category were to be removed, it would still give the right answer, and if the age data was replaced by income it would return an error, allowing you to identify and rectify the problem.