"It allows us to compose flexible MapReduce pipelines that simultaneously utilize both client and cloud resources for running the pipeline and automating data transfers," Feng explains. "This is where the HDInsight resource has been particularly useful."
Using HDInsight to Track and Analyze Social Media
Then there's data services company iTrend, which tracks and analyzes unstructured data generated by social media. It built its new data discovery platform on a hybrid cloud implementation running on Windows Azure that includes an Apache Hadoop cluster for processing raw data and a relational database to work with extracted information. It currently uses an on-premises Apache Hadoop cluster, but plans to migrate it to Windows Azure HDInsight Service.
The platform allows iTrend to provide dynamic reporting tools accessed through a customer portal that customers can use to track campaigns, brands and individual products. Once they specify what they want to monitor, the tool automatically tracks, analyzes and summarizes potentially millions of conversations from multiple sources. It then provides a dashboard view of the data and users can drill down for a more detailed view.
"One search term for a relatively obscure topic such as a rare medical condition might return 100,000 results, while a search for a popular celebrity might generate 100 million," says Michael Alatsortsev, CEO of iTrend, explaining why a big data solution is necessary for iTrend's business.
"To work with high volumes of unstructured data, which is basically just text, we needed to be able to process it in parallel. Trying to do that with a relational database and all of the necessary infrastructure would be too costly."
"From the technology side, we can deploy faster on Windows Azure and add new modules quickly," Alatsortsev says. "And from a business perspective, we're seeing tremendous opportunities even with the first release of our service. We have the tools available to offer features that no one else has."
Sign up for CIO Asia eNewsletters.