# How to Blank Cells with Small Sample Sizes using R in Displayr

Many researchers like to suppress statistics that have small sample sizes. This often is to prevent clients from making false interpretations from the data.

In this post, I explain how you can automatically modify the contents of tables using a secondary R Output. In doing so, we give you a template for some simple R code that you can flexibly use whatever your scenario.

## Cell modification with R, a recap

In "How to Blank and Cap Cells of Tables Using R in Displayr", I explained how you can modify the cells of a table in an R Output by using a *condition*. The condition then becomes the subset of the table you are modifying. It works like this:

`table[condition] = value`

In English, the square brackets specify a subset of a table. When the condition evaluates to `TRUE`

, then we're manipulating just that subset of the table. Using the equals sign, it sets that subset to be equal to a new value. In the case of blanking cells, that value is NA (which stands for a missing value).

Note: In either case, you need to put in an extra line of code, which is just â€˜`table`

â€™. This returns the final table with the substituted values (and not just the value). This line is included as the line of code in the examples below.

## How to blank cells with small sample sizes

Now, to get R to blank a table with small sample sizes, the code needs to reference the sample size for each figure. There are a couple of different ways to give this information to R. I cover one way below and describe an alternative at the end of post.

I like to have a source table that has both the values and the sample size within each cell. In the grid summary table below, Iâ€™ve specified both **%** and **Base n** as statistics.

This table has the name (table.Q5). Putting the following code in an R Output (**Insert > R Output**) will blank all the cells with a **base n** less than 75.

x = table.Q5 y = 75 values_tab = x[,,"%"] base_tab = x[,,"Base n"] values_tab[base_tab < y] = NA values_tab

The first line is specifying the source table. The second line is specifying our threshold for small sample size. The third line creates a table that only has the values (% in this case). The fourth line produces a table of just the base. This is the basis of the condition (next line). The fifth line is the key that pulls it altogether. It basically says *"if the base is less than the threshold of 75 in the table, then substitute withÂ a missing value (NA)"*. The sixth line just returns the new table of values (freshly substituted). So the end result is the below:

## Adapting the code - having a separate table of values and base size

If youâ€™re borrowing the above code, *be sureÂ that youâ€™ve got the correct statistics in the source table.* For example, the **base n** in a cross-tab is different from the** column n**. The **column n** is what you use to derive **column-%â€™s**. Remember, in multi-variable questions (such as a Pick Any), the **base n** or **column n** could vary by row (or column). In the worked example above, each % in the cells of the source table was a separate binary variable (grouped into a Pick Any - Grid), so had its own **base n.**

You donâ€™t have to use just one source tab to house all your reference statistics. You could have the statistics in separate source tables, but youâ€™d need to adjust the code accordingly, a bit like the below (where lines 1 and 2 refer to different tables in the document).

values = table.Q5 base = table.Q5.base y = 75 values[base < y] = NA values

Be aware that the tables need to overlap exactly in terms of the order of their rows and columns. Thatâ€™s why I prefer to use just the one source table (and extract what you need from that) wherever possible.

And of course, you can fiddle with the code to produce a different outcome. For instance, you can set all the cells to `0`

instead of `NA`

if you prefer.

## Try it yourself

The worked example is in this Displayr document, so you can see the code in action.