Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Painless data visualisation

Sharon Machlis | June 7, 2013
One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code.

ggplot(mydata, aes(x=xcol, y=ycol), ylim=0) + geom_line()

Perhaps you'd like both lines and points on that temperature vs. pressure graph?

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line() + geom_point()

The point here (pun sort of intended) is that you can start off with a simple graphic and then add all sorts of customizations: Set the size, shape and color of the points, plot multiple lines with different colors, add labels and a ton more. See Bar and line graphs (ggplot2) for a few examples, or the The R Graphics Cookbook by Winston Chang for many more.

Bar graphs
To make a bar graph from the sample BOD data frame included with R, the basic R function is barplot(). So, to plot the demand column from the BOD data set on a bar graph, you can use the command:


Add main="Graph of demand" if you want a main headline on your graph:

barplot(BOD$demand, main="Graph of demand")

To label the bars on the x axis, use the names.arg argument and set it to the column you want to use for labels:

barplot(BOD$demand, main="Graph of demand", names.arg = BOD$Time)

Sometimes you'd like to graph the counts of a particular variable but you've got just raw data, not a table of frequencies. R's table() function is a quick way to generate counts for each factor in your data.

The R Graphics Cookbook uses an example of a bar graph for the number of 4-, 6- and 8-cylinder vehicles in the mtcars data set. Cylinders are listed in the cyl column, which you can access in R using mtcars$cyl.

Here's code to get the count of how many entries there are by cylinder with the table() function; it stores results in a variable called cylcount:

cylcount <- table(mtcars$cyl)

That creates a table called cylcount containing:

4 6 8

11 7 14

Now you can create a bar graph of the cylinder count:


ggplot2's qplot() quick plotting function can also create bar graphs:


However, this defaults to an assumption that 4, 6 and 8 are part of a variable set that could run from 4 through 8, so it shows blank entries for 5 and 7.

To treat cylinders as distinct groups — that is, you've got a group with 4 cylinders, a group with 6 and a group with 8, not the possibility of entries anywhere between 4 and 8 — you want cylinders to be treated as a statistical factor:


To create a bar graph with the more robust ggplot() function, you can use syntax such as:

ggplot(mtcars, aes(factor(cyl))) + geom_bar()

Histograms work pretty much the same, except you want to specify how many buckets or bins you want your data to be separated into. For base R graphics, use:


Previous Page  1  2  3  4  5  6  7  Next Page 

Sign up for CIO Asia eNewsletters.