Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

8 big trends in big data analytics

Robert L. Mitchell | Oct. 24, 2014
Big data technologies and practices are moving quickly. Here's what you need to know to stay ahead of the game.

The combination of big data and compute power also lets analysts explore new behavioral data throughout the day, such as websites visited or location. Hopkins calls that "sparse data," because to find something of interest you must wade through a lot of data that doesn't matter. "Trying to use traditional machine-learning algorithms against this type of data was computationally impossible. Now we can bring cheap computational power to the problem," he says. "You formulate problems completely differently when speed and memory cease being critical issues," Abbott says. "Now you can find which variables are best analytically by thrusting huge computing resources at the problem. It really is a game changer."

"To enable real-time analysis and predictive modeling out of the same Hadoop core, that's where the interest is for us," says Loconzolo. The problem has been speed, with Hadoop taking up to 20 times longer to get questions answered than did more established technologies. So Intuit is testing Apache Spark, a large-scale data processing engine, and its associated SQL query tool, Spark SQL. "Spark has this fast interactive query as well as graph services and streaming capabilities. It is keeping the data within Hadoop, but giving enough performance to close the gap for us," Loconzolo says.

5. SQL on Hadoop: Faster, better
If you're a smart coder and mathematician, you can drop data in and do an analysis on anything in Hadoop. That's the promise -- and the problem, says Mark Beyer, an analyst at Gartner. "I need someone to put it into a format and language structure that I'm familiar with," he says. That's where SQL for Hadoop products come in, although any familiar language could work, says Beyer. Tools that support SQL-like querying let business users who already understand SQL apply similar techniques to that data. SQL on Hadoop "opens the door to Hadoop in the enterprise," Hopkins says, because businesses don't need to make an investment in high-end data scientists and business analysts who can write scripts using Java, JavaScript and Python -- something Hadoop users have traditionally needed to do.

These tools are nothing new. Apache Hive has offered a structured a structured, SQL-like query language for Hadoop for some time. But commercial alternatives from Cloudera, Pivotal Software, IBM and other vendors not only offer much higher performance, but also are getting faster all the time. That makes the technology a good fit for "iterative analytics," where an analyst asks one question, receives an answer, and then asks another one. That type of work has traditionally required building a data warehouse. SQL on Hadoop isn't going to replace data warehouses, at least not anytime soon, says Hopkins, "but it does offer alternatives to more costly software and appliances for certain types of analytics."


Previous Page  1  2  3  4  5  Next Page 

Sign up for CIO Asia eNewsletters.