Bestiary of Graphics

Here, we illustrate the variety of different graphs possible in R. These are all done with the base R install. Additional packages such as lattice and ggplot2 provide more options and could be looked at once you are familiar with these.

Examples of best (and worst) practice

inspiration | despair …

Functions: hist(), barplot(), pie(), stripchart(), boxplot(), dotchart(), coplot()

Packages: vioplot, plotrix

R code: r code .txt

Learning Goals

A. Foundational Knowledge

Modify default plots using the graphical parameters function

B. Application

Make plots according to specific spublisher or thesis requirements

C. Integration & Human Dimension

Understand the presentation of data
Consider how best to present your data to tell its story

Data

Bird data.

BirdData <- data.frame(
            Tarsus  = c(22.3, 19.7, 20.8, 20.3, 20.8, 21.5, 20.6, 21.5),
            Head    = c(31.2, 30.4, 30.6, 30.3, 30.3, 30.8, 32.5, 31.6),
            Weight  = c(9.5, 13.8, 14.8, 15.2, 15.5, 15.6, 15.6, 15.7),
            Wingcrd = c(59, 55, 53.5, 55, 52.5, 57.5, 53, 55),
            Species = c('A', 'A', 'A', 'A', 'A',  'B', 'B', 'B')
            )

Graph Selection

Choosing the most appropriate graph for the data is one of the most important decisions you can make in presenting your results.

Here, we will look at some basic graphs, based on whether the x (predictor/independent variable) and y (response/dependent variable) are categorical or continuous:

Single continuous
Predictor and response are continuous
Single categorical
Categorical predictor and continous response

Stephen Few has a great Graph Selection Matrix

The Bestiary

For each graph type, we will look at the default appearance and some of the more useful arguments we can alter.

A single continuous variable, the histogram: hist()

**data:** single continuous variable

use: illustrating counts or distributions of data

example: bird data

Histograms present the frequency (count) or probability density distribution of a vector of data.

breaks describes the number of bins or breakpoints between bins.
freq = TRUE defaults to counts; set to FALSE to plot probability distribution.
plot = TRUE default will plot the histogram. Set to FALSE to not plot, and store the counts in an object for future use.

# set layout and margins
par(mfrow = c(2, 2), mar = c(3, 3, 2, 1), lwd = 2)

hist(BirdData$Head, main = "default histogram")
hist(BirdData$Head, breaks = 10, main = "breaks = 10")
hist(BirdData$Head, breaks = seq(from = 30, to = 33, by = 0.25), main = "breaks = seq()" )

# plot probability density distribution
hist(BirdData$Head, breaks = seq(from = 30, to = 33, by = 0.25), freq = FALSE, main = "probability density" )

Fig. Some histograms

Two continuous variables, predictor and response, the scatterplot / x-y plot: plot()

**data:** continuous response; continuous predictor.

use: illustrating relationships and correlations between continuous variables, …

example: BirdData

We have already used scatter plots (see Advanced Graphics.

Here, however, we add a legend to two of the panels, using legend().

par(mfrow = c(2, 2), mar = c(4, 4, 2, 1), lwd = 2)

plot(Head ~ Tarsus, data = BirdData,
     xlab = 'Tarsus (mm)',                    
     ylab = 'Head Size (mm)',             
     main = 'Head vs Tarsus',
     pch = 20,                            
     col = Species,  
     cex = 3)   

plot(Head ~ Wingcrd, data = BirdData,
     xlab = 'Wingcrd (m)',                    
     ylab = 'Head Size (mm)',             
     main = 'Head vs Wing',
     pch = 20,                            
     col = Species,  
     cex = 3)   

plot(Head ~ Weight, data = BirdData,
     xlab = 'Weight (kg)',                    
     ylab = 'Head Size (mm)',             
     main = 'Head vs Weight',
     pch = 20,                            
     col = Species,  
     cex = 3)   
# Here, we add a legend to this plot
  legend('topleft', 
         pch = c(20, 20), col = c(1, 2),
         legend = c('Species A', 'Species B')
        )

plot(Head ~ Weight, data = BirdData,
     xlab = 'Species',                    
     ylab = 'Head Size (mm)',             
     main = 'Head vs Species',
     pch = c(20, 23),      
     col = 1:2,  
     cex = Tarsus/5)   
# Another legend
  legend('topleft', 
         pch = c(20, 23), col = c(1, 2),
         legend = c('Species A', 'Species B')
        )

Fig. Some scatterplots

Categorical predictor/independent variable

The pie chart: pie()

A pie chart is a circular representation of counts or proportion, with the angle corresponding to the value of each category.

data: usually counts or proportion response; categorical predictor

use: never! (unless it involves real pie)

example: BirdFlu

We have annual data from the years 2003-2008. Has the total number of bird flu cases has increased over time?

Load the data and select only cases columns.

BFdata <- read.table(file = 'http://www.simonqueenborough.info/R/basic/data/birdflu_corrected.txt', header = TRUE, sep = '\t')
# select cases columns only
BFcases <- BFdata[, c('cases03', 'cases04', 'cases05', 'cases06', 'cases07', 'cases08')]
names(BFdata)

##  [1] "Country"  "cases03"  "deaths03" "cases04"  "deaths04" "cases05" 
##  [7] "deaths05" "cases06"  "deaths06" "cases07"  "deaths07" "cases08" 
## [13] "deaths08"

str(BFdata)

## 'data.frame':    15 obs. of  13 variables:
##  $ Country : Factor w/ 15 levels "Azerbaijan","Bangladesh",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ cases03 : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ deaths03: int  0 0 0 1 0 0 0 0 0 0 ...
##  $ cases04 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ deaths04: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases05 : int  0 0 4 8 0 0 20 0 0 0 ...
##  $ deaths05: int  0 0 4 5 0 0 13 0 0 0 ...
##  $ cases06 : int  8 0 2 13 1 18 55 3 0 0 ...
##  $ deaths06: int  5 0 2 8 0 10 45 2 0 0 ...
##  $ cases07 : int  0 0 1 5 0 25 42 0 2 1 ...
##  $ deaths07: int  0 0 1 3 0 9 37 0 2 0 ...
##  $ cases08 : int  0 1 0 3 0 7 18 0 0 0 ...
##  $ deaths08: int  0 0 0 3 0 3 15 0 0 0 ...

Now we make a vector of the sum of each year’s cases using colSums(), and add names to each element using names().

Cases <- colSums(BFcases)
names(Cases) <- 2003:2008
Cases

## 2003 2004 2005 2006 2007 2008 
##    4   46   98  115   88   34

The pie() function requires a vector of non-negative integers. The names associated with each element are added as labels.

The default direction of pie() is anti-clockwise. This can be reversed setting clockwise = TRUE (the second pie chart).

For those who will sink so low, there is also a 3-D pie chart function in the ‘plotrix’ package…

par(mfrow = c(2, 2), mar = c(3, 3, 2, 1))
pie(Cases, main = "Ordinary pie chart")
pie(Cases, col = gray(seq(0.4, 1.0, length = 6)), clockwise = TRUE, main = "Grey colours")
pie(Cases, col = rainbow(6), clockwise = TRUE, main = "Rainbow colours")
#install.packages('plotrix')
library(plotrix)
pie3D(Cases, labels = names(Cases), explode = 0.1, main = "3D pie chart", labelcex = 0.6)

Fig. Some pie charts

A word of caution

Most professional statistical graphicicians recommend against using pie charts for the following reasons:

Values of all categories need to sum to a meaningful whole
Categories need to be mutually exclusive
Category ‘other’ may hide important information
It is hard for the human eye to measure angles, and therefore the size of each slice
Pie charts often require extra labels and legends to interpret (low data:ink ratio)

See here for a greater discussion of these issues.

The stacked bar plot (better than a pie chart)

An improvement on the pie chart is the stacked bar plot. This allows the viewer to more readily compare the sizes of each section because we also have an axis. But.. there are no labels (although we could add them with text() or legend()).

# To get the 'stacked' bars, we need to convert our vector to a matrix (barplot() will only stack a matrix).
barplot(as.matrix(Cases), horiz = TRUE, xlim = c(0, 400))

An improvement to the pie chart, the stacked bar plot

The barplot() function is discussed in more detail below.

Categorical predictor/independent variable and continous response variable

There are a variety of ways to plot continuous data as a function of categorical predictors.

If your response data are counts, use a bar chart: barplot()

**data:** continuous or count response; categorical or binned continuous predictor

use: comparing categories, showing distributions, …

example: BirdFluDeath

We use the bird flu data again, but this time select the deaths columns.

BFdata <- read.table(file = 'http://www.simonqueenborough.info/R/basic/data/birdflu_corrected.txt', header = TRUE, sep = '\t')
# select cases columns only
BFdeaths <- BFdata[, c('deaths03', 'deaths04', 'deaths05', 'deaths06', 'deaths07', 'deaths08')]
names(BFdeaths)

## [1] "deaths03" "deaths04" "deaths05" "deaths06" "deaths07" "deaths08"

str(BFdeaths)

## 'data.frame':    15 obs. of  6 variables:
##  $ deaths03: int  0 0 0 1 0 0 0 0 0 0 ...
##  $ deaths04: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ deaths05: int  0 0 4 5 0 0 13 0 0 0 ...
##  $ deaths06: int  5 0 2 8 0 10 45 2 0 0 ...
##  $ deaths07: int  0 0 1 3 0 9 37 0 2 0 ...
##  $ deaths08: int  0 0 0 3 0 3 15 0 0 0 ...

# Make a vector as before
Deaths <- colSums(BFdeaths)
names(Deaths) <- 2003:2008
Deaths

## 2003 2004 2005 2006 2007 2008 
##    4   32   43   79   59   26

How have the number of cases and deaths changed over time?

par(mfrow = c(2, 2), mar = c(3, 3, 2, 1), lwd = 2)
barplot(Cases, main = "default barplot of Cases")

# Create a new variable
Counts <- cbind(Cases, Deaths) # makes a matrix
barplot(Counts, main = "default with 2-column matrix" ) 
barplot(t(Counts), col = gray(c(0.5, 1)), main = "stacked") 
barplot(t(Counts), beside = TRUE, main = "beside")

Fig. Some bar plots

If your response data are continuous, also use a bar chart, but add error bars: barplot()

The bird flu data are just counts, and so we have to use a continuous data set (deer) to obtain means and variances.

First, we need to calculate the mean and standard deviation values for bird heads of each species.

head.mean <- tapply(BirdData$Head, INDEX = BirdData$Species, FUN = mean)
head.sd <- tapply(BirdData$Head, INDEX = BirdData$Species, FUN = sd)

barplot(head.mean, 
        xlab = "Species", ylab = "Head Length (mm)",
        ylim = c(0, 35),
        col = 1:2)

Bird head without error bars

Then we can plot the data and add error bars.

There are two options for creating error bars: arrows() and segments(). Both work in a similar way, drawing lines between two coordinates. For vertical error bars, you keep the x coords the same for each line, and for horizontal error bars keep the y coords the same. arrows() adds an arrow head to one end of the line, and you can set the angle to 90 degrees so that it is a flat arrow. As usual, the coordinate arguments x0, y0, x1 and y1 can take single values or vectors.

We first need to determine the x values for the lines. This is done by assigning the barplot() output to an object and then calling that. This saves the values of the midpoints of each bar.

xvals <- barplot(head.mean, plot = FALSE)
xvals

##      [,1]
## [1,]  0.7
## [2,]  1.9

We can then add the arrows to the bar plot.

barplot(head.mean, 
        xlab = "Species", ylab = "Head Length (mm)",
        ylim = c(0, 35),
        col = 1:2)

arrows(x0 = xvals, y0 = head.mean,      # arrow from ...
       x1 = xvals, y1 = head.mean + head.sd,  #  ... to 
       lwd = 2, angle = 90, length = 0.1)

Bird head with error bars

The function segments() works in an identical fashion, but has no arrow head.

Bar plots are fine, but are limited in that they do not display much of the raw data, and use a lot of ink to convey that small amount of information. There are other plots that show more of the data.

box plot: boxplot()

To provide the viewer with a much better idea of the distribution of your data, a boxplot can be used.

**data:** continuous response; categorical predictor

use:

detection of outliers
displays heterogeneity of distributions
effects of predictor variables

example: Bird data

We already used plot() to make a box plot. However, the function boxplot() has more arguments and we can modify the appearance much more.

par(mfrow = c(2, 2), lwd = 2)

plot(Head ~ Species, 
     data = BirdData, 
     main = "default boxplot with plot()")

boxplot(Head ~ Species, 
        data = BirdData, 
        main = "default boxplot with boxplot()")


boxplot(Head ~ Species, 
        data = BirdData, 
        notch = TRUE,
        main = "A notched boxplot")

## Warning in bxp(structure(list(stats = structure(c(30.3, 30.3, 30.4, 30.6, :
## some notches went outside hinges ('box'): maybe set notch=FALSE

# install.packages('vioplot')
library(vioplot)

bird.A <- BirdData$Head[BirdData$Species == 'A']
bird.B <- BirdData$Head[BirdData$Species == 'B']

#vioplot(bird.A, bird.B)

#mtext(side = 3, text = "default violin plot", line = 1.8, font = 2)

Fig. Some box plots

explanation of the box

median: thick horizontal line
box: 25th and 75th percentiles (lower and upper quartiles). upper - lower = the spread of the data
dotted line: length = 1.56 times spread
outliers: data points outside 1.56 times spread

Display raw and summary data: stripchart()

Graphical best practice is to show both raw data and summary data if possible. The stripchart() function can be used to do just that.

**data:** continuous response; categorical predictor

use: differences between categories, plotting raw data and summary data on top

example: BirdData

The default plots all data on one line, therefore we need to set method = 'jitter' to add some random y-axis variation to each point to show the raw data more clearly.

Using the mean and standard deviations we calculated before, we can then use points() and arrows() to add the summary data on top.

par(mfrow = c(2, 2), mar = c(3, 3, 2, 1), lwd = 2, las = 1, cex = 1.25)

stripchart(BirdData$Head ~ BirdData$Species, 
           xlim = c(29, 34), ylim = c(0.5, 2.5),
           main = "default strip chart",
           col = 'grey80')

stripchart(BirdData$Head ~ BirdData$Species, 
           xlim = c(29, 34), ylim = c(0.5, 2.5),
           main = "add jitter to y",
           col = 'grey80',
           method = 'jitter')

stripchart(BirdData$Head ~ BirdData$Species, 
           xlim = c(29, 34), ylim = c(0.5, 2.5),
           main = "add summary data",
           col = 'grey80',
           method = 'jitter')

  points(x = head.mean, y = 1:2, pch = 20, cex = 2)
  arrows(x0 = head.mean, y0 = 1:2,
         x1 = head.mean + head.sd, y1 = 1:2,
         angle = 90, length = 0.1, lwd = 2)
  arrows(x0 = head.mean, y0 = 1:2,
         x1 = head.mean - head.sd, y1 = 1:2,
         angle = 90, length = 0.1, lwd = 2)

Fig. Some strip charts

Resources

There are many other possibilities for graphics, depending on the nature of your data and what you want to show of it.

websites

http://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html

Gallery of Data Visualisation

Books

R Graphics - full details of base R graphics capabilities

Updated: 2016-10-05