Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Easy ways to do basic data analysis

Sharon Machlis | June 7, 2013
So you've read your data into an R object. Now what?

So you've read your data into an R object. Now what?

Examine your data object
Before you start analyzing, you might want to take a look at your data object's structure and a few row entries. If it's a 2-dimensional table of data stored in an R data frame object with rows and columns -- one of the more common structures you're likely to encounter -- here are some ideas. Many of these also work on 1-dimensional vectors as well.

Many of the commands below assume that your data are stored in a variable called mydata (and not that mydata is somehow part of these functions' names).

[This story is part of Computerworld's "Beginner's guide to R." To read from the beginning, check out the introduction; there are links on that page to the other pieces in the series.]

If you type:

head(mydata)

R will display mydata's column headers and first 6 rows by default. Want to see, oh, the first 10 rows instead of 6? That's:

head(mydata, n=10)

Or just:

head(mydata, 10)

Note: If your object is just a 1-dimensional vector of numbers, such as (1, 1, 2, 3, 5, 8, 13, 21, 34), head(mydata) will give you the first 6 items in the vector.

To see the last few rows of your data, use the tail() function:

tail(mydata)

Or:

tail(mydata, 10)

Tail can be useful when you've read in data from an external source, helping to see if anything got garbled (or there was some footnote row at the end you didn't notice).

To quickly see how your R object is structured, you can use the str() function:

str(mydata)

This will tell you the type of object you have; in the case of a data frame, it will also tell you how many rows (observations in statistical R-speak) and columns (variables to R) it contains, along with the type of data in each column and the first few entries in each column.

For a vector, str() tells you how many items there are -- for 8 items, it'll display as [1:8] -- along with the type of item (number, character, etc.) and the first few entries.

Various other data types return slightly different results.

If you want to see just the column names in the data frame called mydata, you can use the command:

colnames(mydata)

Likewise, if you're interested in the row names -- in essence, all the values in the first column of your data frame -- use:

rownames(mydata)

Pull basic stats from your data frame
Because R is a statistical programming platform, it's got some pretty elegant ways to extract statistical summaries from data. To extract a few basic stats from a data frame, use the summary() function:

 

1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.