Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Social data doesn't have to be big data to be useful

Thor Olavsrud | Sept. 26, 2012
Mention 'social data' or 'sentiment' these days and a conversation about big data is sure to follow, but you don't necessarily need a Hadoop cluster to leverage unstructured data--sometimes all you need is Twitter's native Search API.

Once they had narrowed the population to users they could reliably follow geographically, they still needed to deal with class imbalance: health-related tweets are relatively scarce compared to other types of messages and so reliably classifying them is tricky. To do so, they trained two different binary SVM classifiers&-SVM is an established model of data in machine learning-to accurately distinguish between tweets that indicated the tweeter was sick and all other tweets. One SVM classifier was highly penalized for inducing a false positive (labeling a normal tweet as one about sickness), while the other was heavily penalized for creating a false negative (labeling a tweet about sickness as a normal tweet).

Part of that process involved weighting "features"-essentially keywords-to help the SVMs distinguish between "sick" and normal tweets. For instance, the feature "sick" in a message received a positive weight of 0.9579. However, the feature "sick of" received a negative weight of -0.4005, indicating a lower likelihood that the tweeter was ill.

At the other end, they were able to extract more than 700,000 "sick" messages. The researchers then studied the movements of the users who posted these messages, using their Twitter friendships to gain deeper insight into how the contagion spread:

"To quantify the effect of social ties on disease transmission, we leverage users' Twitter friendships," they wrote. "Clearly, there are complex events and interactions that take place "behind the scenes", which are not directly recorded in online social media. However, this work posits that these latent events often exhibit themselves in the activity of the sample of people we can observe. For instance, as we will see, having social ties to infected people significantly increases your chances of becoming ill in the near future."

However, we do not believe that the social ties themselves cause or even facilitate the spread of infection. Instead, the Twitter friendships are proxies and indicators for a complex set of phenomena that may not be directly accessible. For example, friends often eat out together, meet in classes, share items and travel together. While most of these events are never explicitly mentioned online, they are crucial from the disease transmission perspective. However, their likelihood is modulated by the structure of the social ties, allowing us to reason about contagion."

Marketers Use Twitter to Find Potential Customers

These techniques aren't just useful to researchers. Cold-remedy maker Cold-EEZE and social marketing firm Refine+Focus built Cold-EEZE's social marketing strategy around the research. Refine+Focus founder and CEO Zach Braiker explains that a Cold-EEZE community manager monitors Twitter for cold symptom indicators and then reaches out to form a connection with users tweeting about symptoms.

"We look for people who are expressing cough and cold symptoms," Braiker says. "We respond to nearly everyone that meets those certain criteria and often it creates a meaningful interaction. In some cases, it results in a real friendship."


Previous Page  1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.