The R programming language is an important tool for development in the numeric analysis and machine learning spaces. With machines becoming more important as data generators, the popularity of the language can only be expected to grow. But R has both pros and cons that developers should know.
With interest in the language growing, as shown on language popularity indexes such as TIobe, PyPL, and Redmonk, R first appeared in the 1990s and has served as an implementation of the S statistical programming language. Notes Roger Peng, an 18-year R programming veteran who teaches R both at the university and on the Coursera online platform, "R is the most popular language used in the field of statistics."
"I like [R] because it's very easy to program in from a more computer science-y level," says Peng. And R has gotten faster over time and serves as a glue language for piecing together different data sets, tools, or software packages, Peng says.
"R is the best way to create reproducible, high-quality analysis. It has all the flexibility and power I'm looking for when dealing with data," says Matt Adams, a data scientist at Code School, which offers online programming education. "Most of the programs I write in R are actually just collections of scripts that are organized into projects."
R's strong package ecosystem and charting benefits
R's advantages include its package ecosystem. "The vastness of package ecosystem is definitely one of R's strongest qualities -- if a statistical technique exists, odds are there's already an R package out there for it," says Adams.
"There's a lot of functionality that's built in that's built for statisticians," says Peng. R is extensible and offers rich functionality for developers to build their own tools and methods for analyzing data, he says. "As time has gone on, a lot more people have been attracted to it from other fields," including biosciences and even humanities.
"People can extend it without having to ask permission." Indeed, Peng recalls R's usage terms as being a big help many years ago. "At the time when it first came out, the biggest advantage was that it was free software. The source code and everything about it was available to look at."
All R's graphics and charting capabilities, Adams says, are "unmatched." The dplyr and ggplot2 packages for data manipulation and plotting, respectively, "have literally improved my quality of life," he says.
For machine learning, R's advantages are linked mostly to R's strong ties to academia, says Adams. "Any new research in the field probably has an accompanying R package to go with it from the get-go. So in this respect, R stays at the cutting edge," he says. "The caret package also offers a pretty nifty way of doing machine learning in R through a relatively unified API." Peng also notes that a lot of popular machine learning algorithms are implemented in R.
Sign up for CIO Asia eNewsletters.