Cloudera offers its own distribution of Hadoop that serves as a platform for data management, and its Cloudera Enterprise provides large-scale data storage and analysis. Amr Awadallah, Cloudera's CTO, says the Hadoop distribution enables organizations to collect and combine social data and store it in a centralized data store. Users can then run MapReduce jobs to analyze this data for insight and factors such as new relationships.
But who owns the data? Mozilla's Heilmann views Big Data as any information accumulated on the Web -- any real-time data. But who specifically owns this data? "That's a very loaded conversation," he says.
"I think it's dangerous right now that the speed and beauty of these interfaces [on sites such as Facebook] make people give information away without realizing that they have done it," Heilmann says. For example, people can upload photos of themselves intoxicated and a potential employer can view them for at least some time afterward.
"You have a real problem deleting anything from the Internet," Heilmann stresses. "As soon as you put it up there, it will be cached, it will be copied somewhere else. You should be very mature about what you put online."
GigaOm's Harris says ownership of the data depends on circumstances. "Certainly, the companies generating it own the data," he says.
Although there is publicly owned data on the Web, Facebook and Twitter, for their part, own the data their users generate, Harris notes. And Big Data concepts such as data marketplaces have resulted, for example, in firms analyzing Twitter streams for a month at a time, Harris says. "There's a lot [of data] that's just available out there if you could harness it" and analyze it.
Cloudera's Awadallah says the question of who owns unstructured data is a hard one to answer. Data such as customer purchasing information in Apple's (AAPL) App Store belongs to Apple, he says. And although Google (GOOG) gives users to the right to delete data, it still owns the data itself, he adds.
Thus, the Data Portability Project for porting of social network data promotes the notion that users own their own data and social networks should make it easier for users to move it around. The effort has produced an initiative that aims to get sites to disclose what users can do with their data once it has been uploaded, says Saad, who in addition to his Echo job is co-founder of the Data Portability Project.
Still, Saad notes that in some cases users share ownership and custody of their data with the online services they use. "It's kind of like money in a bank. You own the money but you are basically giving it to the bank to safeguard for you and potentially use on your behalf," he says.
Sign up for CIO Asia eNewsletters.