R | Visualization
| 16 September 2017 | by Tim Bock

7 Alternatives to Word Clouds for Visualizing Long Lists of Data

Creating a meaningful visualization from data with long lists can be challenging. While word clouds are often the popular choice, they are not always the best option. This post illustrates seven alternatives to word clouds that can be used to visualize data from long lists, each has its own trade-offs. The visualization examples in this post use the GDP of 185 countries and are created using R.


The common option: A word (phrase) cloud…

This visualization below is a phrase cloud, showing the whole names of countries (i.e., phrases) rather than just words. The size of each country in the cloud is in proportion to its GDP. While word clouds are often ridiculed, they do scale well. Unlike most charts, a word cloud gets better with the more things that it displays. But word clouds are far from perfect. The rest of this post explores some better alternatives to word clouds.

Word cloud of global GDP


Alternative 1: Circle packing

One standard “fix” to word clouds involves creating a bubble chart with a circle packing algorithm to arrange the bubbles. This avoids the problem that different word lengths bring to word clouds. However, despite their appeal, in this case, the cure is worse than the illness. The small size of the bubbles prevents writing in the labels of all the countries. I have to put the names into tooltips which appear when you hover your mouse over the bubbles (click on the visualization to view). While I love these plots, I am not a great fan of tooltips for critical information. You can, no doubt, appreciate this point if you access this from a mobile device or the R-Bloggers website, where the tooltips cannot be seen unless you click on the visualization.

While I love these plots, I am not a great fan of tooltips for critical information. You can, no doubt, appreciate this point if you access this from a mobile device or the R-Bloggers website, where the tooltips cannot be seen unless you click on the visualization.

 bubble chart with a circle packing

Click the image for an interactive version.


Alternative 2: Cartogram

Rather than packing the circles close together, we can spread them out on a map. I have done this in the cartogram below. The resulting visualization, in most regards, improves on the visualizations above. Problems, however, occur here too. The cartogram relies on a firm understanding of geography, and it fails completely for Europe, where overplotting causes issues. If you have a scroll wheel on your mouse you can zoom in (go to the interactive cartogram). Nevertheless, just as with including names in tooltips (as done with the circle packing), this is a salve rather than a cure. The IMF, who provided the data used in this post, have created a nicer interactive cartogram if you want to see how to do this better.

Cartogram

Click the image for an interactive version.


Alternative 3: Choropleth

choropleth solves the cartogram’s overplotting problem. However, it introduces a different problem. The choropleth below gives a very poor understanding of the distributions of GDPs, essentially splitting the world into three tiers: US, China, and others.

 

Choropleth

Click the image for an interactive version. 


We can improve our ability to distinguish between the countries with smaller GDP by changing to a multi-color scale and transforming the data, as shown below. This does a much better job at allowing us to understand Africa. It also brings to the fore the poor state of the economies of central Asia, which is a feature not emphasized by any of the other visualizations. However, this sharpening of discrimination among the smaller economies comes at a large cost. The naked eye struggles to discriminate between the bigger economies (e.g., Australia vs the US). Furthermore, just as the word cloud struggles when words differ in lengths, the choropleth has its own biases relating to the size of the countries. For example, Japan and Europe can easily be overlooked on this map.

Choropleth with multi-colored scale

Click the image for an interactive version.

Geographic visualization probably works the best for this particular data set. The next few visualizations are much more generally applicable, as they can be used for non-geographic data.


Alternative 4: The horn of plenty

The visualization below takes the bubbles from the cartogram and circle packing and orders them by size, which creates a surprisingly effective way visualizing the distribution of population sizes. However, once more the critical information about which country is which is hidden in tooltips (click on the image), making this a poor visualization for most problems.

The horn of plenty visualization

Click the image for an interactive version

We can make the point that the US and China are the world’s largest economies by adding labels. However, this is not such a compelling improvement. Most viewers could likely have guessed what these labels tell them anyway.

The horn of plenty visualization with labels


Alternative 5: Treemap

All the previous bubbles and plots showed size proportional to diameter, which provides a challenge to most quantitatively-oriented minds, and certainly introduces a degree of perceptual error. Treemaps are the rectangular cousin of bubble charts with circle packing, with the area of each rectangle proportional to GDP. Of the non-geographic visualizations, it is the best one so far, in that it both shows the distribution in a striking Escher-like way while allowing us to see the labels for most of the big countries. But, it is still not without problems. Some countries cannot be found. And, the relative ordering for all but the four largest economies is hard to discern.

Treemap

Click the image for an interactive version


Alternative 6: A donut chart (it does a surprisingly good job)

As I have mentioned before, the hatred that most numerate people have of pie charts is not justified. To my mind, the donut chart below outperforms all the non-geographic visualizations examined so far. Notably, it emphasizes aspects of the data not evident in any of the other visualizations. For example, it allows us to see that biggest four countries’ GDP exceeds that of the rest of the world.  If you are wanting to find data for one of the countries with a smaller GDP, you can, unfortunately, only do so via tooltips.


Alternative 7: Grid of bar charts

I call this last visualization a grid of bars. It consists of a series of bar charts next to each other. I have created each of these charts using R. Then, I laid them out and added a heading in Displayr. You can do this just as easily in PowerPoint or any design app. For a description of how I created it, see my post A Beginners Guide to Using Functions to Create Chart Templates Using R.

This visualization is not pretty, but it is the only visualization which manages to adequately convey the distribution as well as all the detail. Its only real technical limitation is that it can be hard to find a specific country (which is less of a problem in the earlier geographic visualizations).

Grid of bar charts


What have I missed?

In this post, I have shown eight different ways of visualizing long lists of data. Do you know of any better methods? If so, please add a comment.


Explore the visualizations yourself

You can log into Displayr and access the document used to create each of these visualizations here (just sign in first). To see the R code, click on a visualization and the look in Properties > R CODE on the right of the screen.


Acknowledgements

The bubble charts with circle packing use Joe Cheng’s bubbles package. The cartogram, choropleth, horn of plenty, and grid of bars use plotly. The treemap uses canvasXpress.

 

Author: Tim Bock

Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.

Explore the word cloud alternatives in this post

You can log into Displayr and access the document used to create the visualizations. To see the R code, click on a visualization and the look in Properties > R CODE on the right of the screen.

SIGN IN

Related Articles
Using Displayr
Displayr Dashboard Showcase
22 Sep 2017 | by Tim Bock
Visualization to Show a Single Number
12 Visualizations to Show a Single Number
20 Sep 2017 | by Tim Bock
Chart templates using R
Create Chart Templates Using R Functions
14 Sep 2017 | by Tim Bock


6 Comments. Share your thoughts.

  1. Alex Zolot

    You missed Pareto Plot.


    • Tim Bock

      I haven’t seen a Pareto chart for this type of data. It would be great if you could post a link to what you have in mind. (I am used to Pareto charts as an alternative to histograms, for market concentration analyses, and and for showing predictive accuracy.)


  2. Ronán Conroy

    Thank you for a thought-provoking piece. I agree with you that the donut plot showed a lot of promise. While the grid of bar charts shows a lot of information, I think you are falling into the trap of trying to graph a table. The donut chart does not identify the countries with tiny GDP because they don’t count! I think the central message of the chart is that a handful of countries account for more than half the world’s GDP. This comes across perfectly.
    If you wanted to display the GDP of all the world’s countries, then a table is your best bet.


    • Gaurav Jain

      It is a fair point. I do think that the one advantage that the grid of bars has over a table is you can quickly see the biggest points, but for the vast majority of the data the chart is more of a distraction than an aid.


  3. Mark McKeever

    Along the same lines as Ronán Conroy, and Gaurav Jain, the issue is with this large data set, it is the disparity between the largest components and the smaller. How much information are you expecting the user to take in or care about?

    This needs an interactive visualisation akin to zooming on a map, where when zoomed out, the largest are visible, but should the user need to see the other end of the scale they could zoom in to see those countries and exclude the larger components (reducing your data set).

    If you used Horn of plenty interactively allowing a zoom in on a part of the horn lower down
    For the grid of bar charts, you could exclude the top few elements to plot these separately, so that the other entries have a better scale (bigger) to enable better comparison.
    Effectively have two grids at the same scale, but the first grid is a single full width column, and the second is split into the remainder)


    • Gaurav Jain

      Nice ideas. Thanks. With the zoom though, a lot of people still read things in worlds without zooms (e.g., in books, presentations), so I always think of interactivity as a great way to show additional information, but don’t want to rely upon it for the core pattern. I particularly like the idea for the Horn of Plenty. I will try the grid of bars that way next time if I remember!


Leave a Reply

Your email address will not be published. Required fields are marked *

Human? *

Keep updated with the latest in data science.