Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Hadoop vs Spark: Which is right for your business? Pros and cons, vendors, customers and use cases

Scott Carey | July 7, 2016
It's important to note that Hadoop and Spark are broadly different technologies, with different use cases

Gardella at Couchbase agrees: "I think the way Spark executes in machine learning is far better. It is clear that machine learning use cases have been stronger on the Spark side than on the Hadoop side."

Hadoop vs Spark: Customers

In a broad sense, a Hadoop specialist vendor like Hortonworks claims to work with 55 of the top 100 financial services companies and 75 of the top 100 retailers. Actual use cases are harder to come by, perhaps because the technology isn't as mature as the vendors would lead us to believe, perhaps because the customers still see the technology as a secret sauce.

DataStax customer British Gas Connected Homes is using Spark and Apache Cassandra to deliver real-time usage statistics to its customers from its smart-home devices.

Head of data and analytics at British Gas, Jim Anning, says: "We always knew we were doing the Internet of Things and we know that the number of connected devices is only going to rise. Those sensors are collecting data all the time. For example, our temperature sensor is delivering data every couple of minutes. Scaling that process with a traditional, relational database just wasn't going to cut it."

Innovative electric car maker Tesla uses Hadoop for its connected car data, travel booking company Expedia has been moving its data into a Hadoop environment as it continues to scale, and British Airways is a big exponent of Hadoop for data storage and analytics.

AccordGardella says Couchbase is seeing: "Companies that need to get the accounting done right or they go to prison are still running on traditional data warehouses. Banks and retailers are moving to Hadoop because it is just so much cheaper and more flexible and they have the guys with the data analyst skills you need."

Hadoop vs Spark: Vendors

Implementing Hadoop is possible in-house - Apache provides all the documentation required - or you can pick a vendor to conduct an enterprise deployment for you, complete with support. Spark is similar: do it yourself or go to a vendor, such as Hortonworks' Spark at Scale, Cloudera or MapR.

As of December 2015, Gartner has seven vendors offering commercial editions of Hadoop: Amazon, IBM, Pivotal, Transwarp, Hortonworks, Cloudera, MapR. Vendors like Couchbase, MongoDB, DataStax, Basho and MemSQL offer Spark built on competing data management platforms.

NoSQL database vendors have been launching Spark connectors over the last year or so, with MongoDB being one of the most recent entries. VP of strategy Kelly Stirman says MongoDB is different because: "People view a connector as a marketing tactic to draw people into their funnels, so you see incomplete or not-feature rich connectors to check the box."[What does this mean? Suggest killing]

 

Previous Page  1  2  3  4  Next Page 

Sign up for CIO Asia eNewsletters.