Pencil Banner

# Beginner's guide to R: Painless data visualisation

| June 7, 2013
One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code.

hist(mydata\$columnName, breaks = n)

where columnName is the name of your column in a mydata dataframe that you want to visualize, and n is the number of bins you want.

The ggplot2 commands are:

qplot(columnName, data=mydata, binwidth=n)

For quick plots and, for the more robust ggplot():

ggplot(mydata, aes(x=columnName)) + geom_histogram(binwidth=n)

You may be starting to see strong similarities in syntax for various ggplot() examples. While the ggplot() function is somewhat less intuitive, once you wrap your head around its general principles, you can do other types of graphics in a similar way.

There are many more graphics types in R than these few I've mentioned. Boxplots, a statistical staple showing minimum and maximum, first and third quartiles and median, have their own function called, intuitively, boxplot(). If you want to see a boxplot of the mpg column in the mtcars data frame it's as simple as:

boxplot(mtcars\$mpg)

To see side-by-side boxplots in a single plot, such as the x, y and z measurements of all the diamonds in the diamonds sample data set included in ggplot2:

boxplot(diamonds\$x, diamonds\$y, diamonds\$z)

Creating a heat map in R is more complex but not ridiculously so. There's an easy-to-follow tutorial on Flowing Data.

You can do graphical correlation matrices with the corrplot add-on package and generate numerous probability distributions. See some of the links here or in the resources section to find out more.

Using color
Looking at nothing but black and white graphics can get tiresome after a while. Of course, there are numerous ways of using color in R.

Colors in R have both names and numbers as well as the usual RGB hex code, HSV (hue, saturation and value) specs and others. And when I say "names," I don't mean just the usual "red," "green," "blue," "black" and "white." R has 657 named colors. The colors() or colours() function — R does not discriminate against either American or British English — gives you a list of all of them. If you want to see what they look like, not just their text names, you can get a full, multi-page PDF chart with color numbers, colors names and swatches, sorted in various ways. Or you can find just the names and color swatches for each.

There are also R functions that automatically generate a vector of n colors using a specific color palette such as "rainbow" or "heat":

rainbow(n)

heat.colors(n)

terrain.colors(n)

topo.colors(n)

cm.colors(n)

So, if you want five colors from the rainbow palette, use:

rainbow(5)

For many more details, check the help command on a palette such as: