Cascading is a stand-alone open source Java application framework designed as an alternative API to MapReduce. Cascading gives Java developers the capability to build big data applications on Hadoop using their existing skillset. Concurrent founder and CTO Chris Wensel says he created Cascading in anger after having used MapReduce once. He vowed never to use it again.
Cascading now averages more than 275,000 user downloads a month.
Spinning new compute fabrics
Cascading 3.0, the latest release, is a milestone that adds native support for new compute fabrics, in addition to existing portability across programming languages (Java, SQL, Scala) and Hadoop distributions (Cloudera, Hortonworks, MapR). Out of the gate, Cascading 3.0 adds native support for the Apache Tez compute fabric, but Wensel says others will be quickly added. Wensel says he's currently working on support for Apache Spark.
Wensel notes that while Concurrent promised Spark support more than a year ago, Spark proved extremely young and constant changes to the API made working with it difficult. And while there have been many requests for Spark support, he believes that people aren't so much fixated on Spark as they are interested in "faster than MapReduce."
That's where Apache Tez comes in. Wensel chose to concentrate efforts on Tez while Spark matured. Even so, Tez is also a relatively young project.
"Tez removes some of the overhead of MapReduce, but it comes with a cost as well," Wensel says. "We thought it important that Cascading 3.0 simultaneously support MapReduce and Apache Tez. Tez is still early compared with the amount of time it's taken to make MapReduce stable."
"You can write your business logic once on Cascading, have it up and running on MapReduce, and then switch it over to Apache Tez to see whether it runs more performantly and reliably," he adds. "Installing Tez is really just a matter of dropping some .jar files on HDFS."
What's new in Cascading 3.0
New features and benefits in Cascading 3.0 include the following:
- It allows enterprises to build data applications once and then run on the compute fabric that best meets business needs.
- It supports local in-memory, MapReduce and Apache Tez.
- It delivers a flexibile runtime layer for new computation fabrics to integrate and adhere to the semantics of a given compute engine, such as MapReduce, through its pluggable query planner.
- It provides benefits through portability to third-party products, data applications, frameworks and dynamic programming languages built on Cascading.
- It supports compatibility with all major Hadoop vendors and service providers, including Altiscale, Amazon EMR, Cloudera, Hortonworks, MapR, Qubole and others.
Sign up for CIO Asia eNewsletters.