Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Painless data visualisation

Sharon Machlis | June 7, 2013
One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code.

qplot(disp, mpg, data=mtcars)

generates a scatterplot.

The qplot default starts the y axis at a value that makes sense to R. However, you might want your y axis to start at 0 so you can better see whether changes are truly meaningful (starting a graph's y axis at your first value instead of 0 can sometimes exaggerate changes).

Use the ylim argument to manually set your lower and upper y axis limits:

qplot(disp, mpg, ylim=c(0,35), data=mtcars)

Bonus intermediate tip: Sometimes on a scatterplot you may not be sure if a point represents just one observation or multiple ones, especially if you've got data points that repeat — such as in this example that ggplot2 creator Hadley Wickham generated with the command:

qplot(cty, hwy, data=mpg)

The "jitter" geom parameter introduces just a little randomness in the point placement so you can better see multiple points:

qplot(cty, hwy, data=mpg, geom="jitter")

As you might have guessed, if there's a "quick plot" function in ggplot2 there's also a more robust, full-featured plotting function. That's called ggplot() — yes, while the add-on package is called ggplot2, the function is ggplot() and not ggplot2().

The code structure for a basic graph with ggplot() is a bit more complicated than in either plot() or qplot(); it goes as follows:

ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point()

The first argument in the ggplot() function, mtcars, is fairly easy to understand — that's the data set you're plotting. But what's with "aes()" and "geom_point()"?

"aes" stands for aesthetics — what are considered visual properties of the graph. Those are things like position in space, color and shape.

"geom" is the graphing geometry you're using, such as lines, bars or the shapes of your points.

Now if "line" and "bar" also seem like aesthetic properties to you, similar to shape, well, you can either accept that's how it works or do some deep reading into the fundamentals behind the Grammar of Graphics. (Personally, I just take Wickham's word for it.)

Want a line graph instead? Simply swap out geom_point() and replace it with geom_line() , as in this example that plots temperature vs pressure in R's sample pressure data set:

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()

 

It may be a little confusing here since both the data set and one of its columns are called the same thing: pressure. That first "pressure" represents the name of the data frame; the second, "y=pressure," represents the column named pressure.

In these examples, I set only x and y aesthetics. But there are lots more aesthetics we could add, such as color, axes and more.

You can also use the ylim argument with ggplot to change where the y axis starts. If mydata is the name of your data frame, xcol is the name of the column you want on the x axis and ycol is the name of the column you want on the y axis, use the ylim argument like this:

 

Previous Page  1  2  3  4  5  6  7  Next Page 

Sign up for CIO Asia eNewsletters.