Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Painless data visualisation

Sharon Machlis | June 7, 2013
One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code.

One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code.

For example, it takes just one line of code — and a short one at that — to plot two variables in a scatterplot. Let's use as an example the mtcars data set installed with R by default. To plot the engine displacement column disp on the x axis and mpg on y:

plot(mtcars$disp, mtcars$mpg)

You really can't get much easier than that.

Of course that's a pretty no-frills graphic. If you'd like to label your x and y axes, use the parameters xlab and ylab. To add a main headline, such as "Page views by time of day," use the parameter main:

plot(mtcars$disp, mtcars$mpg, xlab="Engine displacement", ylab="mpg", main="MPG compared with engine displacement")

If you find having the y-axis labels rotated 90 degrees annoying (as I do), you can position them for easier reading with the las=1 argument:

plot(mtcars$disp, mtcars$mpg, xlab="Engine displacement", ylab="mpg", main="MPG vs engine displacement", las=1)

What's las and why is it 1? las refers to label style, and it's got four options. 0 is the default, with text always parallel to its axis. 1 is always horizontal, 2 is always perpendicular to the axis and 3 is always vertical. For much more on plot parameters, run the help command on par like so:

?par

In addition to the basic dataviz functionality included with standard R, there are numerous add-on packages to expand R's visualization capabilities. Some packages are for specific disciplines such as biostatistics or finance; others add general visualization features.

Why use an add-on package if you don't need something discipline-specific? If you're doing more complex dataviz, or want to pretty up your graphics for presentations, some packages have more robust options. Another reason: The organization and syntax of an add-on package might appeal to you more than do the R defaults.

Using ggplot2
In particular, the ggplot2 package is quite popular and worth a look for robust visualizations. ggplot2 requires a bit of time to learn its "Grammar of Graphics" approach.

But once you've got that down, you have a tool to create many different types of visualizations using the same basic structure.

If ggplot2 isn't installed on your system yet, install it with the command:

install.packages("ggplot2")

You only need to do this once.

To use its functions, load the ggplot2 package into your current R session — you only need to do this once per R session — with the library() function:

library(ggplot2)

Onto some ggplot2 examples.

ggplot2 has a "quick plot" function called qplot() that is similar to R's basic plot() function but adds some options. The basic quick plot code:

 

1  2  3  4  5  6  7  Next Page 

Sign up for CIO Asia eNewsletters.