Big data is quickly becoming an important part of large-scale business operations. But it's never been very quick itself, due to limitations on how that data is stored, manipulated, and retrieved.
Platfora has developed a solution for big data users that provides business analysts with self-service access, rather than requiring IT to maintain fixed-purpose reporting and analytics, which may fail to deliver the information businesses need to make timely, effective decisions.
In this edition of the New Tech Forum, Ben Werther, CEO of Platfora, gives us a look at how big data can be used for agile business intelligence, without the traditional hangups and sluggish performance. —Paul Venezia
Bringing big data into focus with a better lens
Big data analytics today tends to suffer from an inherent contradiction: To gain competitive advantage, many companies are jumping on big data technologies, which enable them to process raw data in new ways — yielding sharper and much more timely business intelligence. Yet the traditional processes for extracting business intelligence from big data and sharing it throughout the organization are anything but fast.
Without question, the Apache Hadoop open source project has helped to advance big data analytics. Hadoop is massively scalable and provides a framework for distributed processing of massive data sets across clusters of computers using cost-effective commodity hardware. Hadoop's flexible "schema on read" approach enables companies to define schema after data has been stored, instead of being constrained by the traditional database "schema on write" model. But Hadoop has limitations that must be overcome if businesses want to take full advantage of their raw data in all forms.
MapReduce was the original programming model used to process these large data sets in Hadoop. This required companies to hire MapReduce experts and/or train in-house IT staff to pull data out of Hadoop and into a legacy data warehouse. This approach is time-consuming as well as resource-intensive and does not provide subsecond response times required in production environments. Early adopters of the technology have also used Apache Hive and derivative technologies to connect to Hadoop by translating SQL-like queries into MapReduce — but the process is still slow and requires experts. Additionally, these necessary steps toward making Hadoop work for the organization often place significant burdens on IT teams.
The inflexibility and latency of big data analytics are particularly frustrating for business analysts, who are under pressure to deliver timely and actionable business intelligence to the organization. Not only do they typically have little or no control over the data analytics process, most don't even realize how much valuable insight is likely being overlooked due to technology constraints. As the volume of semistructured and, increasingly, multistructured data — Web logs, mobile application server logs, tweets, Facebook Likes, audio files, emails, and more — continues to balloon, the situation will only worsen, yielding more frustration on all sides.
Sign up for CIO Asia eNewsletters.