Syntax cheating: Run SQL queries in R
If you've got SQL experience and R syntax starts giving you a headache -- especially when you're trying to figure out how to get a subset of data with proper R syntax -- you might start longing for the ability to run a quick SQL SELECT command query your data set.
The add-on package sqldf lets you run SQL queries on an R data frame (there are separate packages allowing you to connect R with a local database). Install and load sqldf, and then you can issue commands such as:
sqldf("select * from mtcars where mpg > 20 order by mpg desc")
This will find all rows in the mtcars sample data frame that have an mpg greater than 20, ordered from highest to lowest mpg.
Most R experts will discourage newbies from "cheating" this way: Falling back on SQL makes it less likely you'll power through learning R syntax. However, it's there for you in a pinch -- or as a useful way to double-check whether you're getting back the expected results from an R expression.
Examine and edit data with a GUI
And speaking of cheating, if you don't want to use the command line to examine and edit your data, R has a couple of options. The edit() function brings up an editor where you can look at and edit an R object, such as
Invoking R's data editing window with the edit() function.
This can be useful if you've got a data set with a lot of columns that are wrapping in the small command-line window. However, since there's no way to save your work as you go along -- changes are saved only when you close the editing window -- and there's no command-history record of what you've done, the edit window probably isn't your best choice for editing data in a project where it's important to repeat/reproduce your work.
In RStudio you can also examine a data object (although not edit it) by clicking on it in the workspace tab in the upper right window.
Saving and exporting your data
In addition to saving your entire R workspace with the save.image() function and various ways to save plots to image files, you can save individual objects for use in other software. For example, if you've got a data frame just so and would like to share it with colleagues as a tab- or comma-delimited file, say for importing into a spreadsheet, you can use the command:
write.table(myData, "testfile.txt", sep="\t")
This will export all the data from an R object called myData to a tab-separated file called testfile.txt in the current working directory. Changing sep="\t" to sep="c" will generated a comma-separated file and so on.
Sign up for CIO Asia eNewsletters.