Up until now, we have explored how to modify the parameters of graphs themselves, but what if we want to add extra elements or text to the plot?
We can use many functions, including the points()
, lines()
, or text()
functions to do just that. In all three cases, we need to provide some coordinates (x and usually y), and then provide some specific direction in terms of the plotting character (pch =
), colour (col =
), line width (lwd =
) or size (cex =
) for points; line type (lty =
) or line width (lwd =
) for lines; and labels (i.e., the text that goes at each coordinate, as for axis), size, colour, or font for text.
Before we can use these functions, however, we first need to intialize a plot to which these elements can be added.
Using the state dataset we loaded in this lesson for you, plot Income as a function of Illiteracy using the ‘data’ argument, but set the plot type to ‘n’.
plot(Income ~ Illiteracy, data = state, type = 'n')
We currently have a blank canvas—let’s use the points()
function to begin filling it in.
The syntax of the points()
function works largely like the plot function. Go ahead and add points to the plot, specifying Income as a function Illiteracy with the data argument, but subset the data such that we’re only plotting points where HS_grad is less than 50. Make these points ‘red’.
plot(Income ~ Illiteracy, data = state, type = 'n')
points(Income ~ Illiteracy, data = state[state$HS_grad < 50,], col = "red")
Now do the same thing, but instead add points for HS_grad greater than or equal to 50. Make these points blue.
plot(Income ~ Illiteracy, data = state, type = 'n')
points(Income ~ Illiteracy, data = state[state$HS_grad < 50,], col = "red")
points(Income ~ Illiteracy, data = state[state$HS_grad >= 50,], col = "blue")
Now we can see a bit more information on our graph. Specifically, how different levels of high school graduation rates interact with income and illiteracy. This is one way to code colors in your figures. You will learn another in the next lesson.
Let’s now model this relationship and add a line (you’ve done this before). First create a linear model of Income as a function of Illiteracy, being sure to set the data argument. Set this to the object m1.
m1 <- lm(Income ~ Illiteracy, data = state)
Now, use ‘abline()’ with your model to add a regression line. Set the line width to 3, and the color to ‘orange’.
plot(Income ~ Illiteracy, data = state, type = 'n')
points(Income ~ Illiteracy, data = state[state$HS_grad < 50,], col = "red")
points(Income ~ Illiteracy, data = state[state$HS_grad >= 50,], col = "blue")
abline(m1, lwd = 3, col = "orange")
A definite trend, but it can often be helpful to visualize just how far from zero our slope is. Lets add another abline to our plot, this time creating a horizontal line across our plot as a reference. Instead of giving the function an intercept and slope, we can instead specify either an ‘h’ or ‘v’ argument, meaning horizontal or vertical respectively. This will create a straight line on either the y or x axis, at the value specified.
Using abline()
, set a horizontal line at the mean of our Income variable (do this programatically). Set the line width to 3 again, and the change the line type to a dashed line (check the par help file to see how).
plot(Income ~ Illiteracy, data = state, type = 'n')
points(Income ~ Illiteracy, data = state[state$HS_grad < 50,], col = "red")
points(Income ~ Illiteracy, data = state[state$HS_grad >= 50,], col = "blue")
abline(m1, lwd = 3, col = "orange")
abline(h = mean(state$Income), lwd = 3, lty = 2)
Nice! There will be many times in your figure-making career when you might want to add a useful visual baseline or reference to emphasize a trend.
Suppose we really want to visualize where one state in particular, maybe Connecticut, lies in this relationship. We can do so by adding text at the location of the point associated with CT.
At its simplest, the text()
function takes a label (your text), and places it at a specified x,y coordinate on the plot.
Lets do it! Using a subset of ‘state’, assign to the object ‘x’ the Illiteracy value for Connecticut.
x <- state$Illiteracy[state$state_name == "Connecticut"]
Now do the same for ‘y’, creating the object from the Income value for Connecticut (by subsetting ‘state’).
y <- state$Income[state$state_name == "Connecticut"]
Now, call the ‘text’ function, giving it x, y, and your label ‘Connecticut’. Scale the text down by 0.75.
plot(Income ~ Illiteracy, data = state, type = 'n')
points(Income ~ Illiteracy, data = state[state$HS_grad < 50,], col = "red")
points(Income ~ Illiteracy, data = state[state$HS_grad >= 50,], col = "blue")
abline(m1, lwd = 3, col = "orange")
abline(h = mean(state$Income), lwd = 3, lty = 2)
text(x, y, 'Connecticut', cex = 0.75)
Nice work! Lets add one more detail to the graph. We happen to know what the colors of each point represents, but someone else reviewing your figure might not, so lets include a legend.
The ‘legend’ function, much like the other functions we’ve been playing around with, adds elements to a figure after the initial figure has been drawn.
Enter the following code: legend('topright', legend = c('HS Grad Rate < 50', 'HS Grad Rate >= 50'), col = c('red','blue'), pch = 1, cex = 0.75, bty = 'n')
plot(Income ~ Illiteracy, data = state, type = 'n')
points(Income ~ Illiteracy, data = state[state$HS_grad < 50,], col = "red")
points(Income ~ Illiteracy, data = state[state$HS_grad >= 50,], col = "blue")
abline(m1, lwd = 3, col = "orange")
abline(h = mean(state$Income), lwd = 3, lty = 2)
text(x, y, 'Connecticut', cex = 0.75)
legend('topright', legend = c('HS Grad Rate < 50', 'HS Grad Rate >= 50'), col = c('red','blue'), pch = 1, cex = 0.75, bty = 'n')
While we could specify an exact x,y location of the legend, the function accepts some handy keywords that automatically justify it to generalized location in your plot, like topleft, topright, etc… The legend =
argument represents the name of each legend entry, col =
and pch =
are the styles for each entry, and bty = 'n'
tells R that we don’t want a box around the legend.
Let’s practice once more with the segments()
function. You will need this function frequently in any scenario in which you need to display error bars (often with bar charts and strip charts).
We have a variable in state called ‘Murder_qual’, representing whether the murder rate in each state is qualitatively considered high or low. Create an object called ‘bar1’ using tapply()
to find the mean number of frost days using ‘Murder_qual’ as the index.
bar1 <- tapply(state$Frost, state$Murder_qual, mean)
Now use barplot()
to plot ‘bar1’, but set the y-axis range from 0 to 180 to leave room for our error bars. Save the plot as an object called ‘bar_loc’ (to save the x locations of each bar).
bar_loc <- barplot(bar1, ylim = c(0,180))
Now we need to calculate our sample standard error. R does not have a base function for this, but we have created our own with this lesson called std_error
. Use tapply()
on the same data and index, but find the standard error instead of the mean. Call the object ‘bar1_se’.
bar1_se <- tapply(state$Frost, state$Murder_qual, std_error)
Now enter the following code: segments(x0 = bar_loc, y0 = bar1 + bar1_se*2, x1 = bar_loc, y1 = bar1 - bar1_se*2)
barplot(bar1, ylim = c(0,180))
segments(x0 = bar_loc, y0 = bar1 + bar1_se*2, x1 = bar_loc, y1 = bar1 - bar1_se*2)
The function segments()
requires four (4) arguments, a starting x point (x0 =
), a starting y point (y0 =
), an ending x point (x1 =
) and an ending y point (y1 =
). From these 4 values it can draw a line between the starting and ending coordinates. Further, if we give segments()
a vector of values for each of these arguments, it will draw a segment for each entry in the vector (as we’ve done above).
For each segment, the starting and ending x values of the segment are just the centers of each bar (bar_loc). The starting and ending points of the y values are plus and minus 2 standard errors of our sample (2 standard errors correspond roughly to the 95% confidence interval).
Finally, so far we have just placed one plot or pane in each graphics window. However, in many cases you will want multiple similar plots side-by-side, or different kinds of plots in the same overall figure. There are several ways to do this, but the easiest way to create a set of equal-size panels is to use a call to par()
. There are two arguments here, mfrow =
and mfcol =
.
You use one of these arguments to set up the grid of panels, which you then fill, one-by-one, with separate uses of plot()
(other which ever plotting function you want). Note If you plot more figures than the number of panels available, they will start to overwrite from the beginning.
As we said, par()
has two similar arguments: mfrow
and mfcol
. Both arguments take a vector of the form c(number of rows, number of columns)
.
Using mfcol
fills this grid by colums, using mfrow
fills by rows. For example, par(mfrow = c(2, 2))
creates a graphics window with four panels, in a 2 x 2 layout.
par(mfcol = c(4, 1))
creates a column of four panels.
OK.. now for the finale. Our quest is to make a figure with two graphs, side-by-side, the first being a scatterplot and the second a boxplot. First, use a call to par()
to set the plotting window to a two-by-two pane grid. You can use either mfrow =
or mfcol =
.
par(mfrow = c(1,2))
In theory, nothing should have changed in your plot window. R has secretly set up the grid, but is waiting a call to plot()
. Plot the first graph, which in this case is Income as a function of Illiteracy. Plot this normally (dont supress the output with ‘type’). However, set the axis labels to bold font.
par(mfrow = c(1,2))
plot(Income ~ Illiteracy, data = state, font.lab = 2)
For some reason that we cannot work out, swirl plots two figures each time we call plot()
, which does not happen in regular R. So, don’t worry about that: it’s why we have a 2x2 grid!
If things do not look quite right, you may still end up with a plot. Running another call to plot()
will add in the second pane, so you may need several tries to get things looking correct, and also to have that correct graph in the first pane. That is fine.. you are learning to use R in an interactive and interative way; well done!
Now, add the second pane, a boxplot of HS_grad, with the box colour set to red and the y-axis numbers horizontal.
par(mfrow = c(1,2))
plot(Income ~ Illiteracy, data = state, font.lab = 2)
boxplot(state$HS_grad, col = 'red', las = 1)
Well done! Learning to modify graphics is even more detailed than regular R, and usually requires extensive use of the help files. Check this page for more details and examples of graphics and sparrows: http://www.simonqueenborough.info/R/explore/advanced-graphics
Please submit the log of this lesson to Google Forms so that Simon may evaluate your progress.
Delighted, I’m sure.