Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Beginner's guide to R: Syntax quirks you'll want to know

Sharon Machlis | June 7, 2013
Why x=3 doesn't always mean what you think it should, about data types and more.

my_vector <- (1:10)

I bring up this exception because I've run into that style quite a bit in R tutorials and texts, and it can be confusing to see the c required for some multiple values but not others. Note that it won't hurt anything to use the c with a colon-separated range, though, even if it's not required, such as:

my_vector <- c(1:10)

One more very important point about the c() function: It assumes that everything in your vector is of the same data type -- that is, all numbers or all characters. If you create a vector such as:

my_vector <- c(1, 4, "hello", TRUE)

You will not have a vector with two integer objects, one character object and one logical object. Instead, c() will do what it can to convert them all into all the same object type, in this case all character objects. So my_vector will contain "1", "4", "hello" and "TRUE". In other words, c() is also for "convert" or "coerce."

To create a collection with multiple object types, you need a list, not a vector. You create a list with the list() function, not c(), such as:

My_list <- list(1,4,"hello", TRUE)

Now you've got a variable that holds the number 1, the number 4, the character object "hello" and the logical object TRUE.

Loopless loops
Iterating through a collection of data with loops like "for" and "while" is a cornerstone of many programming languages. That's not the R way, though. While R does have for, while and repeat loops, you'll more likely see operations applied to a data collection using apply() functions or by using the plyr() add-on package functions.

But first, some basics.

If you've got a vector of numbers such as:

my_vector <- c(7,9,23,5)

and, say, you want to multiply each by 0.01 to turn them into percentages, how would you do that? You don't need a for, foreach or while loop. Instead, you can create a new vector called my_pct_vectors like this:

my_pct_vector <- my_vector * 0.01

Performing a mathematical operation on a vector variable will automatically loop through each item in the vector.

Typically in data analysis, though, you want to apply functions to subsets of data: Finding the mean salary by job title or the standard deviation of property values by community. The apply() function group and plyr add-on package are designed for that.

There are more than half a dozen functions in the apply family, depending on what type of data object is being acted upon and what sort of data object is returned. "These functions can sometimes be frustratingly difficult to get working exactly as you intended, especially for newcomers to R," says a blog post at Revolution Analytics, which focuses on enterprise-class R.


Previous Page  1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.