EMC today formally announced a reseller partnership with MapR Technologies, a start-up that plans to sell a proprietary MapReduce product based on Apache Hadoop.
To date, MapR has been in development mode. The company has 15 beta customers testing its product, which wil be sold as both software and as a stand-alone appliance.
"With the EMC deal, we get worldwide distribution," said John Schroeder, CEO of MapR. "[And]...we get a worldwide support organization."
MapR will be part of the recently announced EMC Greenplum HD Enterprise Edition, an interface-compatible implementation of the Apache Hadoop software stack.
Earlier this month, EMC announced its planned partnership with MapR as part of a new direction into offering big data database and MapReduce products.
MapReduce is a framework for processing enormous data sets and performing high-performance analytics in a distributed database that run across a cluster of server nodes. In every cluster, a master node performs the mapping function. As data is input, it is partitioned into smaller sub-groups for processing of a larger query. Because the query is broken into subsets, MapReduce is faster than traditional relational databases at processing "big data" sets.
"This is a major advancement for Hadoop users everywhere. MapR's innovations coupled with EMC's big data analytics capabilities and service will allow more people to use the power of big data analytics and enable substantial market growth," John Webster, a senior partner at market research firm the Evaluator Group, said in a statement. "MapR has managed to innovate on performance, cost reduction, dependability and ease-of-use all at once. This marks a major shift for the Hadoop market."
Luke Lonergan, CTO of EMC's Data Computing Division and a co-founder of Greenplum, the maker of a massively parallel data warehouse that EMC bought last year, said that EMC is working with dozens of resellers to get the MapR Hadoop software to customers.
"Combined with the EMC Greenplum Database, we will allow the co-processing of both structured and unstructured data within a single, seamless solution," said Scott Yara, co-founder of Greenplum and vice president of products for EMC's Data Computing Division.
MapR built a proprietary replacement for the Hadoop Distributed File System (HDFS) that can substitute existing installations of the Hadoop file system. What MapR's product adds is accelerated performance and resilience, according to Schroeder.
"HDFS is really like writing to CD ROM. You can write a file to it, but you can't access it through multiple readers. It's very constrained," he said.
Sign up for CIO Asia eNewsletters.