2 Approaches to Quickly Narrowing Down 3050 Crosstabs and Spotting the Important Ones in Displayr
Ready to save an enormous of time when sifting through thousands of crosstabs? This post describes which buttons to push in Displayr in order to implement two of the approaches described in "3 Ways to Quickly Sift Through 3,050 Crosstabs and Find the Magic One". The post starts by describing how to create lots of crosstabs, then looks at creating a heatmap to summarize crosstabs and automatically deleting tables.
Creating lots of crosstabs
The first step is to create lots of crosstabs. Follow these steps:
- Add a dataset. In this post I am using a file about mobile phones called phone.sav. This files a bit messy, and should be tidied, but let's not go down that rabbit hole just yet.
- Click Insert > Report
- Select Detailed report breaking down every variable set by a list of key variable sets and press Next.
- In the Key variable sets dialog box, select the variable sets to go across the top of your crosstabs and press Next. For the examples in this post, I've selected various five-point agreement scales Allows to keep in touch, Technology fascinating, ..., Would like to do mobile banking with phone.
- Choose your report type, picking the option with just a table per page (shown to the right) and press Create Report.
You will now have many folders, each containing pages cross-tabbing all the variable sets in your project by the key variable sets that you selected. If you are using the phone.sav data set that I am using, you will have almost 2,000 crosstabs!
To see the p-values or z-statistics on any of the tables, click on them and select from Inputs > STATISTICS > Cells.
Creating a heatmap summarizing all the tables
The heatmap below (it may take a while to load) shows the z-statistics for all 3,050 tables, with darker blue for higher z-scores, and the z-scores capped at 5 (i.e., any value greater than 5 is changed to 5, as beyond 5 the differences are immaterial). Make sure you check out this blog post for the technical details on why the heatmap uses z-Statistics versus p-values!
The heatmap was created by:
- Selecting all the folders containing all of the tables
- Insert > Utilities > Significance Testing > Identify Interesting Tables. This creates a table called most.significant.results.
- Insert > Visualization > Heatmap
- Set Inputs > DATA SOURCE > Output in 'Pages' to most.significant.results (it will be at the very bottom)
- Press CALCULATE. (Note that the table of interesting numbers does not automatically update if the inputs tables are changed; this is the exception that makes the rule that everything in Displayr automatically updates).
If you follow these instructions you will get an output that looks a bit different to the one above. The key differences are that:
- I've cleaned and tidied the data prior to running the analysis.
- I modified the code in most.significant.results to exclude the column of SUMMARY tables, as shown below:
Automatically deleting crosstabs
A separate approach to using a heatmap is just to delete all the tables that are not significant. We can do this in Displayr by:
- Selecting all the folders containing tables.
- Automate > Browse Online Library and choosing one of the options. The smaller the p-value, the fewer tables that will be left. These options delete results based on a corrected p-value, taking the multiple comparison settings into account.