What does Netezza's owner IBM think? Well, it thinks about Spark.
Self-service is the goal
Most companies want to achieve a level of self-service from Hadoop. According to the study, the companies that have achieved significant business value have already reached some level of self-service.
Self-service has multiple meanings. On the one hand,you need fewer people involved in managing Hadoop. On the other hand, you need a sufficient amount of data in the lake so that a new feed isn't needed for each new report or dashboard. You also need views and general structure around it to make sure a mere mortal can query it with SQL. Yes, the main way people practice self-service is with SQL tools.
According to the study, most people haven't achieved self-service and thus haven't achieved the tangible value they were looking for.
Anemic 10-node clusters are losers
"Hello world" in Hadoop 2 takes 12 nodes. Anything smaller and you get a really slow version of what you already had in SQL Server. According to the study, the people who have larger clusters have achieved more value.
This isn't shocking. I've mentioned more than a few times that Hive is slow but scales well, and that can be said of other Hadoop technologies. If you have a 10-node cluster, it's probably barely functional. Of course you didn't achieve value.
Secondly, if revenue generation (14 percent) or scale-out (37 percent) are your main business drivers rather than cost, but you don't actually scale out -- then of course you don't achieve value. I'd say this connects to another finding of the survey, which I've covered many times before: If you have an executive mandate, not sponsorship alone, you have a 20 percent better chance of achieving value. In my experience, having an executive mandate usually results in a larger cluster
Exploring the verticals
I made my bones as an open source developer, but today, I do a lot of what could be called sales or sales engineering. Doing this requires fun exercises, such as: "Which industries do we want to focus on?" I came up with financial services, health care, retail, and manufacturing. This was mainly a function of what we'd done before, where we are, and who calls us the most. According to the study, retail didn't make the short list of industries using Hadoop, despite producing many early success stories.
Manufacturing, consulting, telecommunications, financial services and health care all made the list. In my opinion, the opportunities are growing quickest in financial services and health care.
You'd expect financial services companies to be relatively mature users of Hadoop, but that seems to hold true only for the top-tier banks. The next level of clients are barely kicking the tires, but want what the big boys have. Meanwhile, the Affordable Care Act has driven the need for "meaningful use" of electronic medical records. This involves data from disparate systems -- for infection control, population health management, and so on.
Sign up for CIO Asia eNewsletters.