Back in the 1800s John Godfrey Saxe wrote a famous poem about six blind men trying to discover what an elephant is by touching a part and describing it. Saxe observes that, "each was partly in the right, and all were in the wrong." Fast forward to today and we have a similar situation with Hadoop. Opinions about Hadoop are as varied - and sometimes as incorrect as they were in the poem.
Hadoop has been variously described as the ideal way to do transaction processing, the ideal way to do search, and the ideal way to do analysis, all of which are quite different use cases. If that were not unlikely enough, it is also claimed to be the best way to analyze structured data, semi-structured data and unstructured data. In fact, we are led to believe that it is everything to everyone.
How is this possible?
Hadoop is a primitive, undifferentiated technology that can be molded in various ways. In the evolutionary tree, it is far closer to low level programming languages like C and Java than it is to function-specific programs like database management systems and even higher level user applications like spreadsheets.
When people look at Hadoop and describe it in widely varying ways, they are all right since it is like clay that can be, in theory, molded to whatever shape required. The problem is they are also wrong in that Hadoop really is just a lump of clay. Turning it into something useful requires a lot of skill, time and effort. Hadoop 2.0 has done nothing to change this.
Now I am not suggesting Hadoop 2.0 isn't a high quality version of it. You definitely need lower level technologies upon which to build the higher level ones. It's just that the current hype seems to be misplaced. When people praise Frank Lloyd Wright's Fallingwater, how often do they emphasize the chemical composition of the concrete? The important thing about a piece of software is how easy it is to use and apply productively.
Like the traditional analytical stack that employs things like data integration (DI), data warehousing and business intelligence (BI), Hadoop - and Hadoop 2.0 - have given us a new stack that is equally as complex and acronym rich: HDFS2 to YARN to Hbase to various flavors of BI. In this new Hadoop world, data still has to be continually moved from place to place. Too many layers separate the user from their data. Too much time and know-how is required to prepare data. The result: gainfully employed technologists, frustrated business managers, and a lost opportunity to remove the barriers that separate business users from insight.
Sign up for CIO Asia eNewsletters.