26 April 2017 | by Tim Bock

Labeled Scatter Plots and Bubble Charts in R

 

Image-labelled soda can trend arrows

The rhtmlLabeledScatter R package on GitHub that attempts to solve three challenges with labeled scatter plots: readability with large numbers of labels and bubbles, and the use of images.

 


Four solutions for overlapping labels

1. Automatically arranging labels so they do not overlap

If you look at the scatter plot below, you should immediately see the most obvious way that the package deals with overlapping labels: labels are automatically re-arranged so that they do not overlap. Lines connect labels to their points.


2. Allowing viewers to move labels using drag-and-drop

The second option for dealing with overlapping labels is that they are draggable. If you are viewing this visualization using a device with a mouse, you can click on the labels to rearrange them to make them even more readable. If you do this using a software platform that can remember the state of an HTMLwidget, such as Displayr, the final position where you leave a label is remembered.


3. Labels can be dragged off the plot

The third option is that you can drag the labels off the plot, which causes them to be added to a legend. A notation on the relevant axis shows the direction of any removed labels (try this for yourself).


4. Tooltips on hover

The fourth option for addressing overlapping labels is the use of tooltips. Hover your mouse over any point and you can see its label.


Bubble charts

The four tools for addressing overlapping labels are also all available for bubble charts, as illustrated below.

 


Images

It is possible to use images on the scatter plots. Automatically rearranging the images avoids overlaps, as shown in the example below.

 


Trend arrows

The last example, shown below, uses trends to show movement over time on the scatter plot.

 


The source code

Click here to login to Displayr and access the R source code (click on a chart, and from the object inspector, select Properties > R CODE).

Author: Tim Bock

Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.

Categories

Access the R source code yourself

Anyone can login to Displayr and access the R source code.

TRY DISPLAYR



3 Comments. Share your thoughts.

  1. Rick Pack

    Thank you. I’m going to play with the bubble chart feature today.


  2. Rick Pack

    The Github examples made it very easy for me to generate my first bubble plot. Might there be a way to adjust the boundaries of the legend? I am seeing sometimes single decimal place numbers instead of whole numbers even when Z is a variable with only whole numbers a la:
    LabeledScatter(X = mtcars$mpg, Y = mtcars$hp, Z=mtcars$cyl)


    • Chris Facer

      Hi Rick,

      Thanks for the feedback. The reason you get decimal numbers appearing in this example is because the legend values are interpolated based on the range of values, rather than by identifying the discrete values in the data. It picks a set of values which give evenly-spaced circles. I’ll pass your feedback on to the developers.

      Chris


Leave a Reply

Your email address will not be published. Required fields are marked *

Human? *

Keep updated with the latest in data science.