Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Google Cloud Dataflow vs. Apache Spark: Benchmarks are in

Andrew C. Oliver | May 5, 2016
In a simple batch processing test, Google Cloud Dataflow beat Apache Spark by a factor of two or more, depending on cluster size

It was a privilege to work with Google, one of the originators of big data technology. Ultimately, we think the clash between Google Cloud Dataflow and Spark is a "cold war" in which the users of these technologies win. As Google Cloud Dataflow adds a feature, Spark will inevitably work to one-up it and the cycle will begin again. Some people may complain about the number of choices we have in engines and APIs, but the competition is driving innovation in ways we haven't seen in the software industry in years.

The bottom line is that Google Cloud Dataflow is an excellent option for companies looking to do production-level big data processing in the cloud. It might not be the best choice for data scientists experimenting with data due to the lack of REPL support. But with Apache Beam, you could potentially write your production code once and run it on different engines (including Dataflow, Spark, and Flink) and make your choice later. Go ahead and check out the benchmarks yourself.

Source: Infoworld 

 

Previous Page  1  2 

Sign up for CIO Asia eNewsletters.