Using R to Extract Information From an Object in Displayr
Sometimes it is useful to extract a particular bit of information from an object. An object could be a table or R Output. For example, you may want to extract information for particular brand from a table (that shows data for several brands). Subsequently, you may like to use that information in another output, such as a text box. This post describes the various ways that you can extract information from an object. In first example, I will extract the market share of a brand from a table, and then link that to an automatically generated text box (using R). In the following example, I will extract information from an R Output (a regression model).
The standard way of extracting information from a table is via subscripting. Consider the table below. If you click on it in Displayr, you will see its name is table.Preferred.cola.by.Gender (the Name can be seen via the Object Inspector > Properties > GENERAL > Name).
If you create a new R Output (using Insert > R Output), and use the R CODE of table.Preferred.cola.by.Gender[3, 2], it will return the result of 20.7. The code in the brackets is referencing the row and column ([row,column]). In this case, it is the underlying value of the third row and second column of the table (excluding row and column titles). Alternatively, instead of referencing the position, we could reference the row and column titles:
You can then use these extracted results as either inputs to other calculations or to write automated text. For example, the code below generates the text shown beneath it.
paste0(table.Preferred.cola.by.Gender["Coke Zero", "Female"], "% of Females prefer Coke Zero")
Note how the number of decimals has changed! This is not a bug. Displayr always remembers the underlying value. In this case, the true underlying value is 20.7407407407407. However, by default, Displayr shows no decimals for percentages when they are in a table, and one decimal when showing a numeric result in an R Output (you can change this using Appearance > Number in the ribbon). In the last example, it is showing all the decimals because the value is being treated as text.
So in the case of the text box, it is up to us to give formal formatting instructions. In the example below I have:
- Split the code up into three lines to make it easier to read.
- Given formatting instructions to show the number as a percentage with no decimal places. The /100 bit is because the function called FormatAsPercent expects a proportion as an input.
- Used the various formatting options in Object Inspector > Properties > LAYOUT
p = table.Preferred.cola.by.Gender["Coke Zero", "Female"] p.formatted = flipFormat::FormatAsPercent(p / 100, decimals = 0) paste0(p.formatted, " of Females prefer Coke Zero")
Subscripting more complicated objects
Subscripting can also be applied to more complicated outputs. Consider the regression results below.
The Name of this object is glm. In order to extract data from it via subscripting we need to know a bit more information about its structure. The simplest way to do this is to create a new R Output containing the code: names(glm). The result contains a table that tells us how many objects are contained within glm.
We can see that the 5th item is n.predictors. So, we can extract this item as follows. Note that whereas with the table before we used [3, 2], which meant the third row and second column, with this example as it is a vector we do not need the comma.
However, when we do this we get a bit of baggage with it. Rather than just getting the result, we are also getting the name as well. If we just want the result, we instead use double square brackets:
Often it is useful to apply subscripting multiple times. For example, if I type names(glm[]), the result will reveal the names of the items in the 17th item of glm, which is the summary. I see that the 9th item is adj.r.squared. This means that to extract the adjusted R-squared statistic I use (and, I used Appearance > Number in the ribbon to increase the number of decimals):
The example above is a bit messy. If you type glm[][] instead of glm[][] you will get a different answer, but you may not spot it. Fortunately, we can instead reference parts of an object by name. For example:
This example uses $ to extract the whole table of coefficients.
We can combine subscripting with using $. For example, to see the coefficients involved in the regression above, we could type glm$summary$coefficients. Then to extract all the standard errors form the regression model above, we can type glm$summary$coefficients[, 2] or, equivalently, glm$summary$coefficients[, "Std. Error"]. By having nothing before the comma in the bracket, it returns all the rows. By specifying just the column in the bracket, it returns just that column. And, do note that the actual text I have used here instead of Std. Error is actually a bit different to that used in original output table at the top (Standard Error)! It is unusual for them to be different in this way, but R is full of such inconsistencies...
There are three main ways in which objects are structured in R:
- Tabular data is most commonly stored in vectors, arrays, and matrices, which are all subscripted using square brackets, as illustrated earlier.
- When the data cannot be represented in a table, it is typically stored as a list. This is how the regression results were stored. Items are extract from lists using double square brackets (e.g., []) or $.
- An object can have additional attributes saved with it as well. For example, consider our earlier table called table.Preferred.cola.by.Gender. If we type attributes(table.Preferred.cola.by.Gender), we get:
We can then exact these attributes by subscripting or using $. For example, to extract the number of rows, we can use: attributes(table.Preferred.cola.by.Gender)$dim.
We can also use R functions to extract specific bits of information from objects. For example, to extract the number of dimensions of an object, we can use the dim function:
If we wanted to extract the number of rows we could also use dim(table.Preferred.cola.by.Gender), or, more briefly:
R is case sensitive. There is another function for getting the number of rows called nrow. In this example, both functions give the same answer, but they do not always (if you have a vector, of 9 elements, for example, NROW will tell you 9, whereas nrow will tell you the answer is NULL). I tend to use NROW most of the time.
Yet another way is to use the attr function:
There are other more exotic approaches as well. Sometimes an @ is used instead of $. It is not common, but keep it up your sleeve if all else fails.
What to use when?
So, how do you determine whether to us subscripting, $, @, functions, or attributes? My own approach is that I tend to just guess, use the names function, and use a bit of trial and error. A more formal approach is to use the str function, which reveals the structure of the data, but I often find its output a bit confusing:
If you have the time, read a book or do a course on R, and you will learn about classes and types, after which all this stuff becomes a bit more intuitive. But, failing that, trial and error is most straightforward approach, trying things in the order I list above.
Playing with the examples
All the examples in this post are illustrated here in a Displayr document that you can tinker with yourself (trial and error!).