If you've got a lot of data, then Hadoop either is, or should be on your radar.
Once reserved for the Internet empires like Google and Yahoo, the most popular and well-known big data management system is now creeping into the enterprise. There are two big reasons for that: 1) Businesses have a lot more data to manage, and Hadoop is a great platform, especially for combining both legacy old data, and new, unstructured data 2) A lot of vendors are jumping into the game of offering support and services around Hadoop, making it more palatable for enterprises.
"Hadoop is unstoppable as its open source roots grow wildly and deeply into enterprise data management architectures," Forrester analysts Mike Gualtieri and Noel Yuhanna wrote recently in the company's Wave Report on the Hadoop marketplace. "Forrester believes that Hadoop is a must-have data platform for large enterprises, forming the cornerstone of any flexible future data management platform. If you have lots of structured, unstructured, and/or binary data, there is a sweet spot for Hadoop in your organization."
So where do you start? Forrester says there are a variety of places to go, and it evaluated nine vendors offering Hadoop services to find the pros and cons of each. Forrester concluded that there is no clear market leader at this point, with relatively young companies in this market offering compelling services alongside the tech titans.
First, some background: Hadoop is an open source Apache project that anyone can freely download the core aspects of - these include Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop YARN, and Hadoop MapReduce. Many companies from IBM to Amazon Web Services, Microsoft and Teradata all have packaged Hadoop into more easily-consumable distributions or services. Each company takes a slightly different strategy, but the key differentiator for all of these is that Hadoop has the ability to distribute workloads across potentially thousands of servers, making big data manageable data.
Note: This list is based on vendors listed in Forrester's Wave report and is not meant to be all encompassing of Hadoop and big data management platforms. It is listed in alphabetical order.
Amazon Web Services
Customers looking for a public cloud hosted Hadoop platform needn't look much further than the company Forrester calls the "King of the cloud" - Amazon Web Services. The company's Hadoop product is named Elastic Map Reduce (EMR), which AWS says uses Hadoop to offer big data management services. It is not pure open source Hadoop though, it's been tinkered to run specifically on AWS's cloud.
Forrester says that EMR has the largest adoption of the Hadoop platforms in the market. It already has a wide variety of partners that offer services on top of EMR, such as ones that specialize in query, modeling, integration and management. And AWS is innovating; on the roadmap, according to Forrester, is the ability for EMR to automatically scale and resize based on workload needs. The company plans to roll out more robust support for EMR with its other products and services, including its RedShift data warehouse, its newly announced Kenesis real-time processing engine and it has plans to offer support for additional NoSQL databases and business intelligence tools. The one thing AWS does not have is a Hadoop distribution that users can run on their own premises, but the next two companies specializes in that.
Sign up for CIO Asia eNewsletters.