Pencil Banner

# Beginner's guide to R: Easy ways to do basic data analysis

| June 7, 2013

subset(your data object, logical condition for the rows you want to return, select statement for the columns you want to return)

So, in the mtcars example, to find all rows where mpg is greater than 20 and return only those rows with their mpg and hp data, the subset() statement would look like:

subset(mtcars, mpg>20, c("mpg", "hp"))

What if you wanted to find the row with the highest mpg?

subset(mtcars, mpg==max(mpg))

If you just wanted to see the mpg information for the highest mpg:

subset(mtcars, mpg==max(mpg), mpg)

If you just want to use subset to extract some columns and display all rows, you can either leave the row conditional spot blank with a comma, similar to bracket notation:

subset(mtcars, , c("mpg", "hp"))

Or, indicate your second argument is for columns with select= like this:

subset(mtcars, select=c("mpg", "hp"))

Counting factors
To tally up counts by factor, try the table command. For the diamonds data set, to see how many diamonds of each category of cut are in the data, you can use:

table(diamonds\$cut)

This will return how many diamonds of each factor -- fair, good, very good, premium and ideal -- exist in the data. Want to see a cross-tab by cut and color?

table(diamonds\$cut, diamonds\$color)

If you are interested in learning more about statistical functions in R and how to slice and dice your data, there are a number of free academic downloads with many more details. These include Learning statistics with R by Daniel Navarro at the University of Adelaide in Australia (500+ page PDF download, may take a little while). And although not free, books such as The R Cookbook and R in a Nutshell have a lot of good examples and well-written explanations.