People who are serious about understanding the data that you are presenting will want to know your sample size. After all, how can you tell if a result is even remotely interesting if you don't know whether it is based on 10 observations or 1000?
In Displayr, you can add a widget to your page which describes the sample size for the results you are presenting, as well as which filters are applied to the data. This allows your results to be understood at a glance, rather than leaving your viewers wondering "how many people are in this data?", or "is this really meaningful?". In this post I show you how to create a description of your sample, and how to customize its appearance. The best part is, the text will update itself when the person viewing your document applies a filter.
Adding the sample size description
In this example I am using data from a survey that asked people about their experiences travelling on a railway. The data I want to chart is from a survey question which asks respondents to rate how well the station staff communicated with them when their train was delayed. Out of a total of 9,263 people who answered the survey, this question was only asked of the 1,919 people who had experienced a late train.
Thus, I need to communicate information about the number of people who experienced a delay.
To add the sample size description to my page:
- Select Insert > More > Data > Sample Size Description.
- Click on the Complete Data Variable in the Object Inspector on the right, and choose the same variable that I used in my chart.
- Change the Label to "People who experienced a delay".
- Tick Automatic. This ensures that whenever my data changes, including it being filtered, the sample size description will work out the current sample size for my page and display it correctly.
The basic text looks like this:
The key element is the Complete Data Variable. This needs to be a variable which has missing values for any observations that are not in the sample that you are describing. Missing values are special values in data which are often used to indicate when a case or observation is not to be included in the calculation of an average, percentage, or other statistic. In a survey like this one, missing values are given to people who are not asked a question. Thus, people who did not have an answer about the train delay have a missing value. For this reason it is appropriate to select the same variable as the Complete Data Variable.
In some cases you may need to construct a variable which has the right cases for the Complete Data Variable. For more on this, see the last section below.
Formatting and placement
To change the appearance of the text that is displayed:
- Click on the text of the sample size description.
- Go to Properties > APPEARANCE in the Object Inspector on the right hand side of the screen.
Here you can change the font and you can set a background color and border. For my example, I set the colors so that they blend well with my slide background.
It is important to keep in mind that as filters are applied to the page, the text in your description will grow in length. You need to ensure that the container (the box that encloses your text) has enough room for the text, otherwise scrollbars may appear. There are two solutions to this:
- Make the box wide - for example, along the bottom of your page (like the one above).
- Allow the text to wrap, by selecting Properties > Layout > Wrap text output, and make the box taller so that the text can tidily wrap over multiple lines.
Data set up
In the above example I was able to choose the Complete Data Variable to be the same variable as the one in my chart. If you have a chart which shows filtered data, then it will not be appropriate to use the same variable as the one in your chart, because that variable does not contain enough information about the sample shown in the chart. One approach is to create a new variable that has missing values for people who are not included on your page.
Say for example I have a page showing charts or tables only for the Males in my sample. In such a case I can use the filter variable to define my sample. I can:
- Select my filter under Data Sets.
- Copy it by selecting Home > Duplicate. This is not strictly necessary, but for organization I like to keep multiple distinct copies of my variables when they need to be used for different purposes.
- With the new copy selected, click Properties > DATA VALUES > Missing values.
- For any categories that are not included in the sample, change the setting in the Missing Values column to Exclude from analyses, and click OK.
In the case of my filter variable, there are two categories:
- Selected, which means the person is included in the Males filter
- Not Selected, which means the person is not Male and not in the filter. By changing the setting above, these people now have a missing value for this variable.
The new variable can be used as the Complete Data Variable.