31 January 2017 |
Introducing Displayr: The Data Science and Reporting App for Everyone
In the world of startups, it is said that if you’re not embarrassed by the first version of your product, then you’ve launched too late. I am happy to report that, more than 11 years after cutting our first line of code, we are ready to launch Displayr (pronounced “displayer”, not “display R”). We are not launching too late!
Displayr’s not yet at a stage where we are going to charge you to use it. But, the beta version is ready for use. If you have found your way to us, please try it out. We hope you like it. If you have any feedback, firstname.lastname@example.org.
In this, the first post on our new blog, I provide a brief overview of what Displayr does, who it’s for, its key benefits, and the bits that embarrass us.
What Displayr does and who Displayr is for
Displayr is a web-based app for analyzing data. Everything from surveys and transaction data through to the internet of things and wearables data.
It replaces products like R, IBM SPSS Statistics, and SAS.
It is also a reporting app. You can share your analysis as a dashboard. You can export to PowerPoint and Excel.
It is for everybody. If you are a college student undertaking your first stats course, it’s going to make your life easy. If you are a hard core data scientist, it will save you lots of time and reduce your chances of making mistakes.
No training required (you can figure it out)
Most data science apps take months to learn. In Displayr, you can start your data analysis in a few seconds. You don’t believe me? Try it yourself.
Don’t get me wrong: of course there’s some complicated stuff in there – particularly if you want to do hardcore data science without a background in that area. On the other hand, if you have a pretty standard data set and want to do some standard analytics, you should be able to figure out most things. And, it will usually be quite easy. Best of all: if you get stuck, just send us an email on email@example.com.
The secret sauces
As I write this, we have been working on Displayr for eleven years. This is because Displayr differs from existing products in quite a few significant ways.
It’s all about the outputs, not the code
The screenshot below shows a page in a Displayr document. The page contains a title, a table showing results from a random forest, which, in this case, predicts whether or not emails are spam based on their content. On the right side of this page you can also see a callout, where I have typed up some comments. If you are experienced in advanced data analysis, there is a good chance that you will look at what is below and think:
“Ah! That’s the report, not the analysis.”
You would be mistaken. (Keep reading, and it’ll all become clear.)
The outputs are “live”: you can interact with them
The screenshot below is showing the same random forest outputs. I clicked the output table, and you can now see the object inspector on the right-hand side. This shows the Inputs (i.e. which data and assumptions have been selected) and provides access to more technical Properties such as the underlying R Code (see Introduction to Displayr 5: Machine learning and multivariate statistics). These are all live, which means that if you change the selected Predictors, the output will update. You can try this yourself.
Everything is organized: Documents, Pages, and Data
If you are experienced at data science and analysis, you will know that one of the great challenges is organization.
Displayr helps you organize your analyses by introducing two new data analysis concepts referred to as Documents and Pages. We also make extensive use of a rarely used existing idea: the Data tree.
When you log into Displayr, you first see a list of all of your Documents. Each of these documents contains a complete set of all of the data, calculations, and outputs generated for your project.
When you go into a document, Displayr shows the Pages on the left of the screen. In the screenshot shown here, the page with the random forest is selected. Underneath it, you can see that it contains three objects: the Title, rf (which is the name of the random forest output), and Callout.
Folders can group Pages together. You can see Cola Tables located at the bottom of this tree.
Below the Pages tree, you find the second tree: Data. It lists all the data sets in the document and in this case, there are two: Cola perceptions and Email Spam. As with the Pages, folders group the data sets together and within these folders, variables are grouped into variable sets.
The data and outputs are live: they automatically interact with each other
The links between the data and the outputs are live. If you modify the data in some way, the outputs will automatically update. For example:
- If you click on a data set, the object inspector will give you options, such as importing a revised data set, specifying the relationship between this data set and another data set, or modifying the data in some other way. When you make these changes, any outputs or derived variables, will update automatically. You can try this yourself.
- If you click on a variable set – that is, an item in the list of data beneath the data set, such as Spam, make, and address, in the example here – you can modify its properties. Any outputs computed using that data will automatically update. See the post on variable sets for more information.
Safer data analysis: automated quality control processes
When you do data analysis in Displayr, a few great things are happening in the background:
- Displayr remembers everything you do. Any analysis you create is reproducible. There is therefore no need to write down your steps or save your code. Displayr does all that for you, always.
- If you make a change that has any knock-on effects, Displayr automatically checks and updates everything. For instance, let’s say you merge together two categories of a variable. Displayr then automatically updates all analyses that use that variable. Similarly, if you delete something accidentally, then Displayr will alert you to all the things that you have just inadvertently broken, and allows you to undo the deletion.
- Displayr saves a history of older versions of your document (accessible on the Documents page).
Do just about anything using R
While all the basic data analysis can be done in Displayr by dragging and dropping, the R data science language is also available within Displayr. Because of the combined power of R and Displayr, you can do just about any type of data analysis. Therefore you can do everything and anything from classical statistics to machine learning.
Collaboration: Displayr is for teams
When you conduct analyses in Displayr, you do so in documents. Lots of people can author these documents simultaneously. They can also be exported to PowerPoint or as webpages (dashboards), with all the code hidden, making sharing of documents a breeze.
Displayr is currently in beta mode. You will find bugs and some little things that still embarrass us. This means that some useful features are still missing and one or two may not work as expected. You can check out our roadmap for more information on our future plans.
Please try Displayr yourself. Designed for anyone to use, you will find that playing around in Displayr will quickly teach you the ropes. However, Displayr has lots of unique ideas built into it, so we have prepared six blog posts that provide a guided tour of the app:
- Introduction to Displayr 1: Overview
- Introduction to Displayr 2: Getting your data into
- Introduction to Displayr 3: Creating tables, charts, and other visualizations
- Introduction to Displayr 4: Simple calculations
- Introduction to Displayr 5: Machine learning and multivariate statistics
- Introduction to Displayr 6: Reporting – automated and reproducible
You can find technical reference materials on our wiki.
Author: Tim Bock
Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.