The CEO of a new Yahoo spinoff dedicated to developing and promoting the popular Apache Hadoop distributed computing platform urged adoption Wednesday of Apache's Hadoop distribution. But Cloudera, also a player in the Hadoop space, is sticking by its own Hadoop platform.
A day after Yahoo announced Hortonworks, a joint venture with Benchmark Capital to further develop Hadoop, Hortonworks CEO Eric Baldeschwieler stressed commitment to Apache Hadoop. "We're just asking everyone to commit to basing their offerings on the Apache Hadoop offering, and anybody who does that is our partner," Baldeschwieler said in a presentation at the Hadoop Summit 2011 event in Silicon Valley.
Hadoop is becoming popular for managing large volumes of data. More sites download Apache's Hadoop than any other release, said Baldeschwieler. But in a statement obtained afterward by InfoWorld, Cloudera COO Kirk Dunn reaffirmed support for his company's own Hadoop technologies. "Apache Hadoop absolutely is the foundation. Cloudera's Distribution Including Apache Hadoop is a 100-percent open-source platform that includes Apache Hadoop (not a fork or derivative, but actual Apache Hadoop). One of the things that Cloudera has pioneered is including not just Apache Hadoop, but also the full Hadoop stack -- Apache Pig, Apache Hive, Apache Hbase, Apache Flume, Apache Sqoop, Apache Zookeeper, Apache Whirr, and others -- which, when integrated and bundled, makes Hadoop more consumable and easier to manage. When deploying even moderately sized Hadoop clusters, these are non-trivial issues with which every enterprise has to deal."
The relationship between Cloudera and Hortonworks is "yet to be determined," Baldeschweiler said. "We've been working together on making Apache Hadoop great the last few years. We've had our differences." There is room for lots of players in this space, he said.
Baldeschwieler also stressed the potential for Hadoop and the level of interest in it in the enterprise and government worldwide. "We really believe that half the world's data will be stored in Apache Hadoop over the next five years." Hortonworks will focus on making Apache Hadoop "great," he said.
Hadoop, though, needs to be spruced up with third-party support and also faces a knowledge gap, impeding its growth, Baldeschwieler said. "That's our opportunity and our role and our aspirations -- to really bridge the gap and make Apache Hadoop much easier to install, use, and manage in companies around the world." Hortonworks will sell training and support and make Hadoop better, taking on such tasks as opening up APIs and improving administration, management, availability, and robustness, he said.
Jay Rossiter, senior vice president for Yahoo's Y! cloud platform group, said Hadoop is not quite mainstream but is getting there, with governments and companies involved in areas like the Web and finance adopting it. Yahoo starting developing Hadoop about five years ago as a research project, Rossiter said at the event. Besides Yahoo, companies like IBM have used it: IBM leveraged Hadoop in the Watson computer system recently featured on the "Jeopardy" TV game show.
Sign up for CIO Asia eNewsletters.