Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Learn to crunch big data with R

Martin Heller | Feb. 12, 2015
Get started using the open source R programming language to do statistical computing and graphics on large data sets.

You may also want to perform some of the analysis in the database instead of in the app. IBM has done a good job of providing an example, along with the R source code. Consider the analysis shown in Figure 7.

Streaming the data out of the database and into R can take a significant amount of time. If you eliminate most of the network streaming, you can vastly reduce the time needed for the analysis. You'll notice that the timing for the in-database regression analysis is 2.7 seconds. The same task with the regression done in-application took 1.47 minutes -- more than 30 times longer. The regression coefficients computed were exactly the same. All that changed was that one analysis did the regression where the data resided, and the other first streamed the data from the database to the R application.

The IBM implementation is not unique; I happened to have a Bluemix account. Vertica (HP), Greenplum (Pivotal), Oracle, and Teradata all have R packages. I'm not sure how far the others have gone in the direction of in-database analytics, however.

By the way, I was pleasantly surprised to find that running RStudio Server Pro in a browser feels exactly like running RStudio on my desktop -- nicely done.

Shiny and R Markdown

Of course, developers and analysts never really get away with simply writing the code and determining the results. Top management wants monthly reports, and middle management wants to play with the data without knowing anything about what's under the covers. Enter shiny and rmarkdown, two R packages from RStudio for Web applications and reporting, respectively.

Figure 8 shows a simple Shiny app running in RStudio. The code is from Lesson 2 of the Shiny tutorial.

You can use Shiny to build interactive and "reactive" Web apps, with widgets that correspond to HTML control elements such as input fields. By "reactive," RStudio means that when a value changes, all values with dependencies on the changed value are recalculated, as you'd expect from a spreadsheet program. Figure 9 shows an interactive Shiny app with two widgets for input and a shaded choropleth map of U.S. census data for output.

The interactive Shiny app in Figure 9 is a good example of how you can allow middle management to play with the data without their having to know what's under the covers.

To limit what is recomputed when input changes, the reactive wrapper function caches its values and recomputes only those that are invalid. I'll forgo burdening you with an example, although you'll find one in Shiny Lesson 6. Shiny apps can run on your own hardware, or you can publish them to the server. For a quick example, have a look at Figure 10.


Previous Page  1  2  3  4  5  Next Page 

Sign up for CIO Asia eNewsletters.