Will 2014 see the emergence of a big data equivalent of the LAMP stack?
Richard Daley, one of the founders and chief strategy officer of analytics and business intelligence specialist Pentaho, believes that such a stack will begin to come together this year as consensus begins to develop around certain big data reference architectures—though the upper layers of the stack may have more proprietary elements than LAMP does.
"The explosion of dynamic, interactive websites in the late 1990s and early 2000s was driven, at least in part, by the LAMP stack, consisting of Linux, Apache HTTP server, MySQL and PHP (or Perl or Python)."
"There's thousands of big data reference architectures out there," Daley says. "This is going to be more of a 'history repeats itself' kind of thing. We saw the exact same thing happen back with the LAMP stack. It's driven by pain. Pain is what's going to drive it initially; pain in the form of cost and scale."
But, Daley says, organizations dealing with that pain with big data technologies—42 percent of organizations were already engaged in some form of big data initiative in 2013, according to a CompTIA study—quickly begin to see the upside of that data, particularly organizations that leverage it for marketing or for network intrusion detection.
"In the last 12 months, we've seen more and more people doing big data for gain," he says. "There is much more to gain from analyzing and utilizing this big data than just storing it."
The explosion of dynamic, interactive websites in the late 1990s and early 2000s was driven, at least in part, by the LAMP stack, consisting of Linux, Apache HTTP server, MySQL and PHP (or Perl or Python). These free and open source components are all individually powerful tools developed independently, but come together like Voltron to form a Web development platform that is more powerful than the sum of its parts. The components are readily available and have open licenses with relatively few restrictions. Perhaps most important, the source is available, giving developers a tremendous amount of flexibility.
While the LAMP stack specifies the individual components (though substitutions at certain layers aren't uncommon), the big data stack Daley envisions has a lot more options at each layer, depending on the application you have in mind.
'D' Is for the Data Layer
The bottom layer of the stack, the foundation, is the data layer. This is the layer for the Hadoop distributions, NoSQL databases (HBase, MongoDB, CouchDB and many others), even relational databases and analytical databases like SAS, Greenplum, Teradata and Vertica.
"Any of those technologies can be used for big data applications," Daley says. "Hadoop and NoSQL are open, more scalable and more cost-effective, but they can't do everything. That's where guys like Greenplum and Vertica have a play for doing some very fast, speed-of-thought analytical applications."
Sign up for CIO Asia eNewsletters.