Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Syntax quirks you'll want to know

Sharon Machlis | June 7, 2013
Why x=3 doesn't always mean what you think it should, about data types and more.

Plain old apply() runs a function on either every row or every column of a 2-dimensional matrix where all columns are the same data type. For a 2-D matrix, you also need to tell the function whether you're applying by rows or by columns: Add the argument 1 to apply by row or 2 to apply by column. For example:

apply(my_matrix, 1, median)

returns the median of every row in my_matrix and

apply(my_matrix, 2, median)

calculates the median of every column.

Other functions in the apply() family such as lapply() or tapply() deal with different input/output data types. Australian statistical bioinformatician Neal F.W. Saunders has a nice brief introduction to apply in R in a blog post if you'd like to find out more and see some examples. (In case you're wondering, bioinformatics involves issues around storing, retrieving and organizing biological data, not just analyzing it.)

Many R users who dislike the the apply functions don't turn to for-loops, but instead install the plyr package created by Hadley Wickham. He uses what he calls the "split-apply-combine" model of dealing with data: Split up a collection of data the way you want to operate on it, apply whatever function you want to each of your data group(s) and then combine them all back together again.

The plyr package is probably a step beyond this basic beginner's guide; but if you'd like to find out more about plyr, you can head to Wickham's plyr website. There's also a useful slide presentation on plyr in PDF format from Cosma Shalizi, an associate professor of statistics at Carnegie Mellon University, and Vincent Vu. Another PDF presentation on plyr is from an introduction to R workshop at Iowa State University.

R data types in brief (very brief)
Should you learn about all of R's data types and how they behave right off the bat, as a beginner? If your goal is to be an R ninja then, yes, you've got to know the ins and outs of data types. But my assumption is that you're here to try generating quick plots and stats before diving in to create complex code.

So, to start off with the basics, here's what I'd suggest you keep in mind for now: R has multiple data types. Some of them are especially important when doing basic data work. And some functions that are quite useful for doing your basic data work require your data to be in a particular type and structure.

More specifically, R has the "Is it an integer or character or true/false?" data type, the basic building blocks. R has several of these including integer, numeric, character and logical. Missing values are represented by NaN (if a mathematical function won't work properly) or NA (missing or unavailable).


Previous Page  1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.