In contrast, Spark was designed to tackle more complex queries involving techniques of machine learning and predictive modeling, among others. "Things that Hadoop MapReduce was pretty good at, Spark is potentially better at," Monash said.
Another early adopter of Spark has been music streaming service Spotify, which uses the technology to generate playlists of music based on the user's specific tastes based on a set of machine learning algorithms.
Even Hadoop users are getting the message. Hadoop distributor Cloudera, which also includes Spark in its releases, has about 60 enterprise customers using Spark in some form or another, according to Monash. Other Hadoop distributors, notably Hortonworks and MapR, also offer Spark in their distributions.
The Spark project was started in 2008 at the University of California, Berkeley's AMPLab (the AMP stands for Algorithms, Machine and People). Now under the guidance of the Apache Software Foundation, the project gets more contributions than any other Apache software project. Core contributors include engineers and developers from companies such as Intel, Yahoo, Groupon, Alibaba and Mint.
Spark can be used in conjunction with Hadoop, to analyze data on the Hadoop File System (HDFS), or it can be run on its own. Developers build applications off of Spark using either Python, Java or the Scala programming languages.
"Part of the attraction of Spark is that it has a pretty nice API [application programming Interface] that makes it accessible to use for developers and engineers," said Reynold Xin, a Databricks co-founder.
We will see many more products and services based on Spark next year, predicted Databricks' Ghodsi. Programmers are often are asked about their Spark chops.
"We've had multiple [job] candidates out there say that they have seen multiple exciting Spark projects," Ghodsi said.
Sign up for CIO Asia eNewsletters.