Pencil Banner

Beginner's guide to R: Easy ways to do basic data analysis

| June 7, 2013

?median

The function description should say whether the na.rm argument is needed to exclude missing values.

Checking a function's help files -- even for simple functions -- can also uncover additional useful options, such as an optional trim argument for mean() that lets you exclude some outliers.

Not all R functions need a robust data set to be useful for statistical work. For example, how many ways can you select a committee of 4 people from a group of 15? You can pull out your calculator and find 15! divided by 4! times 11! ... or you can use the R choose() function:

choose(15,4)

Or, perhaps you want to see all of the possible pair combinations of a group of 5 people, not simply count them. You can create a vector with the people's names and store it in a variable called mypeople:

mypeople <- c("Bob", "Joanne", "Sally", "Tim", "Neal")

In the example above, c() is the combine function.

Then run the combn() function, which takes two arguments -- your entire set first and then the number you want to have in each group:

combn(mypeople, 2)

Use the combine function to see all possible combinations from a group.

Probably most experienced R users would combine these two steps into one like this:

combn(c("Bob", "Joanne", "Sally", "Tim", "Neal"),2)

But separating the two can be more readable for beginners.

Get slices or subsets of your data
Maybe you don't need correlations for every column in your data frame and you just want to work with a couple of columns, not 15. Perhaps you want to see data that meets a certain condition, such as within 3 standard deviations. R lets you slice your data sets in various ways, depending on the data type.

To select just certain columns from a data frame, you can either refer to the columns by name or by their location (i.e., column 1, 2, 3, etc.).

For example, the mtcars sample data frame has these column names: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear and carb.

Can't remember the names of all the columns in your data frame? If you just want to see the column names and nothing else, instead of functions such as str(mtcars) and head(mtcars) you can type:

names(mtcars)

That's handy if you want to store the names in a variable, perhaps called mtcars.colnames (or anything else you'd like to call it):

mtcars.colnames <- names(mtcars)

But back to the task at hand. To access only the data in the mpg column in mtcars, you can use R's dollar sign notation: