2. Hadoop: The new enterprise data operating system
Distributed analytic frameworks, such as MapReduce, are evolving into distributed resource managers that are gradually turning Hadoop into a general-purpose data operating system, says Hopkins. With these systems, he says, "you can perform many different data manipulations and analytics operations by plugging them into Hadoop as the distributed file storage system."
What does this mean for the enterprise? As SQL, MapReduce, in-memory, stream processing, graph analytics and other types of workloads are able to run on Hadoop with adequate performance, more businesses will use Hadoop as an enterprise data hub. "The ability to run many different kinds of [queries and data operations] against data in Hadoop will make it a low-cost, general-purpose place to put data that you want to be able to analyze," Hopkins says.
Intuit is already building on its Hadoop foundation. "Our strategy is to leverage the Hadoop Distributed File System, which works closely with MapReduce and Hadoop, as a long-term strategy to enable all types of interactions with people and products," says Loconzolo.
3. Big data lakes
Traditional database theory dictates that you design the data set before entering any data. A data lake, also called an enterprise data lake or enterprise data hub, turns that model on its head, says Chris Curran, principal and chief technologist in PricewaterhouseCoopers' U.S. advisory practice. "It says we'll take these data sources and dump them all into a big Hadoop repository, and we won't try to design a data model beforehand," he says. Instead, it provides tools for people to analyze the data, along with a high-level definition of what data exists in the lake. "People build the views into the data as they go along. It's a very incremental, organic model for building a large-scale database," Curran says. On the downside, the people who use it must be highly skilled.
As part of its Intuit Analytics Cloud, Intuit has a data lake that includes clickstream user data and enterprise and third-party data, says Loconzolo, but the focus is on "democratizing" the tools surrounding it to enable business people to use it effectively. Loconzolo says one of his concerns with building a data lake in Hadoop is that the platform isn't really enterprise-ready. "We want the capabilities that traditional enterprise databases have had for decades -- monitoring access control, encryption, securing the data and tracing the lineage of data from source to destination," he says.
4. More predictive analytics
With big data, analysts have not only more data to work with, but also the processing power to handle large numbers of records with many attributes, Hopkins says. Traditional machine learning uses statistical analysis based on a sample of a total data set. "You now have the ability to do very large numbers of records and very large numbers of attributes per record" and that increases predictability, he says.
Sign up for CIO Asia eNewsletters.