Perhaps you've heard that the next new thing in IT is "big data" and concluded that the hype-cycle machine is turning out another attention-getter. I'm not big on predicting paradigm shifts, so I won't in this case. But I will say that if you're an IT professional, you ignore big data at your peril. I believe this one is all it's cracked up to be and more.
First, a word of caution. As with the cloud (the last new thing), we are now in the definition stage. New and often conflicting definitions abound as vendors attach their own meanings to the term big data.
The most common source of confusion results from the conflation of big data storage with big data analytics. Big data analytics is the big deal. Big data storage is really nothing more than storage that handles a lot of data for applications like high-definition video-streaming.
One large storage vendor that has yet to make a big-data statement told me that his company was considering "Huge Data" as a moniker for its big data storage entry. Seriously. Someday soon, big data storage will begin to support big data analytics. Right now, though, I think it's key to first figure out if the vendor is pitching storage or analytics.
The definition of big data analytics is also getting pulled in somewhat conflicting directions. One can start with an understanding of data warehousing and add capabilities that the classic data warehouse doesn't offer.
For starters, big data analytics encompasses unstructured and structured data. It's widely believed that 80% of all data is unstructured. Big data analytics means that unstructured data -- the bulk of what's out there -- can now be mined.
The classic data warehouse user sets up queries and gets results anywhere from a day to a week later, whereas the goal for many big data analytics processes is to deliver results to users in real time.
Finally, data warehousing works with a limited number of data sources. Big data analytics has the power to combine disparate sources -- like a supply chain tracking system that commingles RFID, GPS and product shipment data -- to deliver information previously unattainable.
I could say that any definition of big data analytics must combine all three of these attributes, but that would be misleading. What isn't helpful is relabeling something as "big data," like saying a traditional data warehousing product is now big data simply because it handles bigger data volumes.
Rather than quibbling over definitions at this stage, what we really should be after as IT professionals is understanding and hopefully leveraging what is new. The ability to encompass unstructured data into the business analytics process is new. The ability to converge multiple data sources -- structured and unstructured -- is new. And the ability to produce new types of information in real time is decidedly new and powerful.
Sign up for CIO Asia eNewsletters.