Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Get your data into R

Sharon Machlis | June 7, 2013
Some tips on how to import data in various formats, both local and on the Web.

If you'd like to try to connect R with a database, there are several dedicated packages such as RPostgreSQL, RMySQL, RMongo, RSQLite and RODBC.

(You can see the entire list of available R packages at the CRAN website.)

Remote data
read.csv() and read.table() work pretty much the same to access files from the Web as they do for local data.

Do you want Google Spreadsheets data in R? You don't have to download the spreadsheet to your local system as you do with a CSV. Instead, in your Google spreadsheet -- properly formatted with just one row for headers and then one row of data per line -- select File > Publish to the Web. (This will make the data public, although only to someone who has or stumbles upon the correct URL. Beware of this process, especially with sensitive data.)

Select the sheet with your data and click "Start publishing." You should see a box with the option to get a link to the published data. Change the format type from Web page to CSV and copy the link. Now you can read those data into R with a command such as:

mydata <- read.csv("http://bit.ly/10ER84j")

The command structure is the same for any file on the Web. For example, Pew Research Center data about mobile shopping are available as a CSV file for download. You can store the data in a variable called pew_data like this:

pew_data <- read.csv("http://bit.ly/11I3iuU")

It's important to make sure the file you're downloading is in an R-friendly format first: in other words, that it has a maximum of one header row, with each subsequent row having the equivalent of one data record. Even well-formed government data might include lots of blank rows followed by footnotes -- that's not what you want in an R data table if you plan on running statistical analysis functions on the file.

Help with external data
R enthusiasts have created add-on packages to help other users download data into R with a minimum of fuss.

For instance, the financial analysis package Quantmod, developed by quantitative software analyst Jeffrey Ryan, makes it easy to not only pull in and analyze stock prices but graph them as well.

All you need are four short lines of code to install the Quantmod package, load it, retrieve a company's stock prices and then chart them using the barChart function. Type in and run the following in your R editor window or console for Apple data:

install.packages('quantmod')

library('quantmod')

getSymbols("AAPL")

barChart(AAPL)

Want to see just the last couple of weeks? You can use a command like this:

 

Previous Page  1  2  3  4  Next Page 

Sign up for CIO Asia eNewsletters.