A typical full genome sequence can contain 120GB to 150GB of compressed data, requiring about half a terabyte of storage for processing, he says. In the past, it would take three days to analyze it, but with 30 to 40 machines running Hadoop, NextBio's staff can do it now in three to four hours. "For any application that has to make use of this data, this makes a big difference," Alag says.
Another big advantage is that he can keep scaling the system up as needed by simply adding more nodes. "Without Hadoop, scaling would be challenging and costly," he says. This so-called horizontal scaling -- adding more nodes of commodity hardware to the Hadoop cluster -- is a "very cost-effective way of scaling our system," Alag explains. The Hadoop framework "automatically takes care of nodes failing in the cluster."
That's dramatically changed the way the company can expand its computing power to meet its needs, he says. "We don't want to spend millions of dollars on infrastructure. We don't have that kind of money available."
Allows for new types of applications
One huge benefit of Hadoop is its ability to be able to analyze huge data sets to quickly spot trends, Lazzaro says. For a major retailer, that could mean scouring Facebook or Twitter user data to learn what scarf colors were in fashion last season, to be able to compare that information with today's hot color trends to help determine what will sell this season.
"It gives you the ability to look back in time to look for opportunities for new sales," Lazzaro says. This plays out at Concurrent when the firm analyzes a commercial or ad for a car dealership. "We can look at the data to see who's watched the commercials; then you might have a targeted sales lead you can leverage to make a sale. You don't always know what you are looking for."
Traditional databases can work for many sorting and analysis needs, but with ultra-large data sets, Hadoop can be a much more efficient way to find things, Lazzaro says. "It's really built for handling that."
For their part, eBay's engineers "like being able to work with unstructured data ... and build new products for eBay quickly," Williams says. Because eBay engineers can access the firm's 300 million listings, historical information and vast amounts of related information, Williams says, "this allows us to understand customers and build experiences they want." It's not really about the structured versus unstructured issue; rather, "it's about our engineers being able to roll up their sleeves and work with our data like never before," he says.
In the last year, eBay has done "some really amazing things with Hadoop, including improvements in merchandising, buyer experience and how customers use the site," Williams says.
Sign up for CIO Asia eNewsletters.