Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Get your data into R

Sharon Machlis | June 7, 2013
Some tips on how to import data in various formats, both local and on the Web.

If your data use another character to separate the fields, not a comma, R also has the more general read.table function. So if your separator is a tab, for instance, this would work:

mydata <- read.table("filename.txt", sep="\t", header=TRUE)

The command above also indicates there's a header row in the file with header=TRUE.

If, say, your separator is a character such as | you would change the separator part of the command to sep="|"

Categories or values? Because of R's roots as a statistical tool, when you import non-numerical data, R may assume that character strings are statistical factors -- things like "poor," "average" and "good" -- or "success" and "failure."

But your text columns may not be categories that you want to group and measure, just names of companies or employees. If you don't want your text data to be read in as factors, add stringsAsFactor=FALSE to read.table, like this:

mydata <- read.table("filename.txt", sep="\t", header=TRUE, stringsAsFactor=FALSE)

If you'd prefer, R allows you to use a series of menu clicks to load data instead of 'reading' data from the command line as just described. To do this, go to the Workspace tab of RStudio's upper-right window, find the menu option to "Import Dataset," then choose a local text file or URL.

As data are imported via menu clicks, the R command that RStudio generated from your menu clicks will appear in your console. You may want to save that data-reading command into a script file if you're using this for significant analysis work, so that others -- or you -- can reproduce that work.

The 3-minute YouTube video below, recorded by UCLA statistics grad student Miles Chen, shows an RStudio point-and-click data import.

UCLA statistics grad student Miles Chen shows an RStudio point-and-click data import.

Copying data snippets
If you've got just a small section of data already in a table -- a spreadsheet, say, or a Web HTML table -- you can control-C copy those data to your Windows clipboard and import them into R.

The command below handles clipboard data with a header row that's separated by tabs, and stores the data in a data frame (x):

x <- read.table(file = "clipboard", sep="\t", header=TRUE)

You can read more about using the Windows clipboard in R at the R For Dummies website.

On a Mac, the pipe ("pbpaste") function will access data you've copied with command-c, so this will do the equivalent of the previous Windows command:

x <- read.table(pipe("pbpaste"), sep="\t")

Other formats
There are R packages that will read files from Excel, SPSS, SAS, Stata and various relational databases. I don't bother with the Excel package; it requires both Java and Perl, and in general I'd rather export a spreadsheet to CSV in hopes of not running into Microsoft special-character problems. For more info on other formats, see UCLA's How to input data into R which discusses the foreign add-on package for importing several other statistical software file types.


Previous Page  1  2  3  4  Next Page 

Sign up for CIO Asia eNewsletters.