| 05 February 2017 |
It will not be immediately obvious, but Excel is a key inspiration for the design of Displayr. However, when you jump into Displayr you will be struck more by the differences than the similarities. In this post I compare how Displayr works relative to Excel.
This post is written with advanced Excel users in mind. If you are not an Excel power user, you would be better served by reading Introducing Displayr: the data science and reporting app for everyone.
Performing spreadsheet-like calculations
Consider the table below. It has been derived from a market research survey. The columns show how likely people said they would be to recommend the brands shown in the rows. A standard way of evaluating this data is to compute the Net Promoter Score (NPS), which is the percentage of people that gave a rating of 9 or 10, less the percentage that gave a rating of 0 through 6. In Excel, you would compute this analysis by getting the data into Excel, and then using a formula. For example, if you pasted this table into cell A1 in an Excel sheet (open Excel and try this out), you might compute the NPS for Apple by using the formula =SUM(K2:L2)-SUM(B2:H2), i.e. the 9-10 score total minus the 0-6 score total.
In Displayr, you can perform this analysis with a very similar calculation: sum(table.1[1, 10:11]) – sum(table.1[1, 1:7]). Click here to try it for yourself. A few things to note:
- In Displayr, the distinction between uppercase and lowercase is important: sum and SUM are not the same thing.
- Displayr calculations require you to mention the name of a table. In this case, table.1. This is often true in Excel as well, where you refer to sheet names or named ranges.
- In the table[x,y] reference, the x values refer to the rows of data, and the y values to columns. It excludes row headings. Thus, the 1 in the sum(table.1[1, 10:11]) – sum(table.1[1, 1:7]) formula relates to Apple which is the first row of data, and columns 1 to 7 refer to the columns Not at all likely 0 through to 6. For the table[x,y] reference, if x is blank then all of the rows are returned e.g. table.1[,y] refers to all rows of table.1 for column y; the same applies for columns if y is blank.
- There are a number of other ways we could have written the formula. I describe the best ways at the end of the post, but for the moment note that we would get the same answer if using sum(table.1[“Apple”, 10:11]) – sum(table.1[“Apple”, 1:7]) or table.1[1, 10]) + table.1[1, 11]) – sum(table.1[“Apple”, 1:7]).
The R CODE box
The big difference between Displayr and Excel in this example relates to where the calculation is placed. In Excel, I would put it in a new column to the right of the existing columns. In Displayr, we instead select Insert > R Output and enter the code into the R CODE box on the right of the Displayr window and press Calculate. This puts the output of the calculation in an entirely new object on the page.
As you use Displayr, to find out more about a table, graph, or calculation you will typically look in the R CODE box, which you can see by clicking on the table or graph. The R CODE box is the closest equivalent to the formula field in Excel, although as we’ll see later the R CODE box provides users with a phenomenal increase in power and flexibility.
The Displayr equivalent to copying and pasting a formula: vector arithmetic
In Excel, once you had computed the formula for Apple, you would likely have copied and pasted it to apply to all the other brands. In Displayr, we instead modify the formula, to use vector arithmetic. This is fancy way of saying that we write formulas that do the same thing many times. Excel power users will be familiar with this, as it is the same idea as Excel’s Array Formulas.
If we type table.1[ , 10:11], Displayr interprets this as meaning we want all the rows of data and columns 10 and 11 (i.e., we want the vectors representing columns 10 and 11). This may lead you to think we can just change our formula to sum(table.1[ , 10:11]) – sum(table.1[ , 1:7]). But this would not work, as it would compute the total of columns 10 and 11 less the total of columns 1 through 7, i.e. it sums all of the rows together rather than summing each of the rows separately. Instead in Displayr we use the rowSums(x,y) instruction. We write rowSums(table.1[, 10:11]) – rowSums(table.1[, 1:7]), noting that you need to make sure the ‘S’ is capitalized. The good news is that if you do this, Displayr will even put the row labels on for free:
The Displayr equivalent to copying and pasting a formula: copying and pasting
If you are an experienced Excel user, you will be in the habit of copying entire sheets and, if needs be, changing a few formulas (particularly those with $ in them). In Displayr, you can do the same thing. Go into the example project that I have created for this blog post: you will see the two tables above located together on the first page. Copy and paste that first page to create a second page, and the new page will contain exactly the same tables. However, the formulas in the second table (which shows the NPS) update, and now refer to the second table of data. That is, in Excel language, the formulas are copied as being relative.
Fixed versus relative references in Displayr
In Excel, you fix a formula by typing $, naming the cell, or pressing F4 one, two, or three times. In Displayr, references become fixed or relative based on whatever you copy and paste. For example if you first copied and pasted the original data table (with the scores for 0-10 responses), and then copied and pasted the calculated NPS table, the new copy of the NPS table would refer back to the original table, not the copied table. If my explanation makes no sense to you, please try it yourself and it should be a lot clearer!
The Displayr equivalent to naming sheets
In Displayr, table.1 is basically the same idea as Sheet 1 in Excel. Just like in Excel, we can improve the readability of your formulas by renaming the table. You rename something by clicking on it, and changing the name in Properties > GENERAL > Name, which is located on the right side of the screen. And just like Excel, if you do rename something, everything else automatically updates (i.e., there is no need to rewrite any other formulas).
If you want to save a bit of time, you can rename things directly by typing them into the R code. For example, setting the name of the table to recommendations, and inserting an R Output, containing the following code, that R Outputs name becomes “NPS“:
NPS = rowSums(recommendations[, 10:11]) - rowSums(recommendations[, 1:7])
Should I then insert another R Output, but type into it mean.nps = mean(NPS), the new R Output will have the name of mean.nps (note the use of punctuation to make names more readable), and it will compute the mean of the NPS in the R Output called NPS.
Formulas over multiple lines
So far in this post I have been illustrating the similarities and differences between Excel and Displayr. You may well be thinking at this stage, “what is the point of using Displayr, sure it works a bit like Excel, but is it easier and faster to use?”. In the rest of the post I focus on things that Displayr does that are painful to do in Excel.
The first of these things that cannot readily be done in Excel is that you can create calculations that have lots and lots of different calculations within them. For example, the following four lines of code will compute the average NPS for all the brands. For complex calculations, this is both a lot faster and easier to manage, and a lot faster, than having lots of interlinked calculations:
promoters.by.brand = rowSums(recommendations[, 10:11]) detractors.by.brand = rowSums(recommendations[, 1:7]) nps.by.brand = promoters.by.brand - detractors.by.brand average.nps = mean(nps.by.brand)
Many, many, more functions (via R)
While Excel has quite a few formulas in it, Displayr has several thousands of data science formulas. This is because the code in all the examples above is written in the data science programming language called R. Whenever we create an R Output, Displayr sends the encrypted data and code, to a server that running R. This server computes the calculations and returns the results to Displayr.
Ease of creating tables
One of the great benefits that Displayr has over Excel relates to creating tables.
The example table below resulted from the raw survey data that we started with above. If you are an Excel gun, perhaps you could create the table below in 10 minutes, provided that you had a deep understanding of the structure of the underlying data, you knew how to create pivot tables, and somebody had given you the data file in a friendly format. If you are not an Excel gun, you would have had to get somebody else to create it in a special-purpose survey analysis package, such as Q or SPSS. In Displayr, we create it very simply, by:
- Importing the raw data (Home > Data Set > Data)
- Dragging the Variable Set from the list of Variable Sets (which is found in the Data section in the bottom left) onto the page (see Understanding Variable Sets in Displayr for more information).
Drag and drop to merge categories
Earlier in the blog I showed you we could sum up the detractor scores for each brand using rowSums(recommendations[, 10:11]). I did this to make the point that Displayr and Excel can do things similarly. However, there is a much smarter way to do this calculation in Displayr:
- Use your mouse to select the column headings of the first 7 columns.
- In the ribbon menu, go to Data Manipulation > Rows/Columns > Merge and then Rename.
- Use drag and drop to merge the columns in the Passive and Neutral categories, as shown below.
Note that if you try it yourself in Displayr, the various other calculations we have performed will break, as they are all premised on the original structure of the data. This is, of course, exactly the same problem you would have in Excel when you delete columns! You can fix this by creating and modifying a copy of the table on any Displayr page.
Quality assurance and automation
The benefit of merging is not merely one of speed, as Displayr also does the following things simultaneously:
- Displayr keeps a record of what the categories were prior to you merging them. Look at this example in Displayr and hover your mouse over the column headings. Displayr will tell you what the categories used to be.
- You do not have to write code. This reduces the chance of an error. For example, with a table like this, people can get confused about whether the Not at all likely category is stored in the database as a 1 or a 0. This, combined with the previous points gives you much greater confidence in your quality assurance processes.
- If you change your mind later, you can pull them back apart (Data Manipulation > Split).
- If you need to redo the analysis later, you can select the data set under Data, and import a revised data set. Displayr will apply this and all other calculations to the revised data set, thus making your report automated.
Excel is great for many forms of data manipulation due to its spreadsheet layout, where the user can perform whatever calculations they want in the various cells. However, it requires many advanced calculations using the raw data. For example, if you look at the table above you will see color-coding and arrows. This tells us that IBM’s detractor score is relatively low, whereas Google, Sony, and Panasonic all have relatively high Promoter scores. “High” here means statistically significant: there is a good chance that if the study were to be repeated it would reach similar conclusions. Whenever you create a table in Displayr by dragging and dropping data, Displayr will automatically perform such statistical analysis. Displayr can do this because it has access to all the raw data. If, instead, you only have the final tables, such tests are difficult and often impossible to perform in Excel.
A variable-based view of the world
The table below shows the raw data used to create the table above. Each row represents the data for a person that completed the survey. Each column shows the person’s rating of the brands. An NaN stands for Not a Number (i.e., the person provided no rating). A 0 indicates that they were not at all likely to recommend the brand, a 1 indicates a rating of 1/10, etc. The tables above have been created from such raw data. However, rather than compute NPS from the tables, a smarter alternative involves creating and recoding new variables. My post on The NPS recoding trick goes through this in detail.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.