Cloudera has a distribution of the open source Hadoop, which uses many aspects of the Apache project, but has a number of advancements on top of that as well. Cloudera has developed a number of features for its product, from a management and monitoring tool named Cloudera Manager, to a SQL engine to run relational data on Hadoop named Impala. Cloudera uses open source Hadoop for the basis of its distribution, but it is not a pure open source product. When Cloudera's customers need something that open source Hadoop doesn't have, they build it, or they find a partner who has it. "Cloudera's approach to innovation is to be loyal to core Hadoop but to innovate quickly and aggressively to meet customer demands and differentiate its solution from those of other vendors," Forrester says. The result has been steady adoption of Cloudera's platform, with more than 200 paying customers, Forrester says, some whom have more than 1 petabyte under management across more than 1,000 nodes.
Like Cloudera, Hortonworks is a pure-play Hadoop company. Unlike Cloudera, Hortonworks sticks to the open source Hadoop code stronger than perhaps any other vendor. Hortonworks' goal is about building up the Hadoop ecosystem and Hadoop users, and advancing the open source code. Its platform sticks closely to the open source code. Company officials say this benefits users because it prevents vendor lock in (if a Hortonworks customer ever did need to leave their platform, then they could easily port applications off of the platform on to the open source code). That's not to say Hortonworks does not innovate on top of the open source code though. The company gives all of its work developing the platform back to the open source community. An example of this is Ambari, a tool developed by Hortonworks to fill a hole in the project around cluster management. Hortonworks' approach has garnered strong partnerships for Hortonworks from vendors like Teradata, Microsoft, Red Hat and SAP.
When enterprises think of big IT projects, many think of IBM, and rightly so. Because of that, IBM has become a major player in the world of Hadoop projects. Forrester says IBM already has more than 100 Hadoop deployments, and many customers with petabytes worth of data. The company leverages its vast experience in grid computing, a global data center and enterprise implementation experience to its big data projects. "IBM's road map includes continuing to integrate the BigInsights Hadoop solution with related IBM assets like SPSS advanced analytics, workload management for high performance computing, BI tools, and data management and modeling tools," Forrester says.
Like Amazon Web Services, Intel is leveraging and optimizing its version of Hadoop to run on its hardware, specifically its Xeon chips. For customers looking to push the limits of their Hadoop system and looking for the closest affinity between the software and the hardware, then Intel's distribution of Hadoop could be the one for you. Forrester notes that Intel just recently rolled this product out though, so the company is expected to innovate quite a bit on top of the version it has in the market now. Intel and Microsoft were listed as "strong performers" in the Hadoop marketplace, compared to the other seven previously listed companies who were listed as "leaders."
Sign up for CIO Asia eNewsletters.