The combination of big data and compute power also lets analysts explore new behavioral data throughout the day, such as websites visited or location. Hopkins calls that "sparse data," because to find something of interest you must wade through a lot of data that doesn't matter. "Trying to use traditional machine-learning algorithms against this type of data was computationally impossible. Now we can bring cheap computational power to the problem," he says. "You formulate problems completely differently when speed and memory cease being critical issues," Abbott says. "Now you can find which variables are best analytically by thrusting huge computing resources at the problem. It really is a game changer."
"To enable real-time analysis and predictive modeling out of the same Hadoop core, that's where the interest is for us," says Loconzolo. The problem has been speed, with Hadoop taking up to 20 times longer to get questions answered than did more established technologies. So Intuit is testing Apache Spark, a large-scale data processing engine, and its associated SQL query tool, Spark SQL. "Spark has this fast interactive query as well as graph services and streaming capabilities. It is keeping the data within Hadoop, but giving enough performance to close the gap for us," Loconzolo says.
5. SQL on Hadoop: Faster, better
These tools are nothing new. Apache Hive has offered a structured a structured, SQL-like query language for Hadoop for some time. But commercial alternatives from Cloudera, Pivotal Software, IBM and other vendors not only offer much higher performance, but also are getting faster all the time. That makes the technology a good fit for "iterative analytics," where an analyst asks one question, receives an answer, and then asks another one. That type of work has traditionally required building a data warehouse. SQL on Hadoop isn't going to replace data warehouses, at least not anytime soon, says Hopkins, "but it does offer alternatives to more costly software and appliances for certain types of analytics."
Sign up for CIO Asia eNewsletters.