Technically engineers could architect the system to allow for additional functionality for analysis on top of MapReduce, but now Yarn acts as a platform for hosting apps for that specific purpose. Some believe Yarn could be the base-level framework for a platform as a service (PaaS) running on Hadoop that could compete with the likes of VMware's open source Cloud Foundry PaaS.
Apache Hadoop 2.0 is expected to be declared stable enough for a beta release at some point this week, with a general availability release expected in the coming weeks after that, Murthy says. Some of Hadoop's earliest adopters, like Yahoo, have already tested Yarn and companies that create commercial distributions of the code are expected to integrate Yarn into their offerings as well. Hortonworks, for example, hopes to have Yarn functionality in its Hadoop distribution by mid to late summer.
So does 2.0, and specifically Yarn, represent Hadoop growing up? "Absolutely," says Adrian, the Gartner analyst. "But mainstream organizations need to rely on the commercial distributor for anything they expect to put into serious production use." Companies like Hortonworks, Cloudera, MapR and even IBM all have commercial distributions of the code. While the project may be growing up, Adrian notes it's still in its "early adolescence," he notes. The addition of Yarn could go a long way to supporting a budding industry of creating applications that run on Hadoop, though.
Sign up for CIO Asia eNewsletters.