Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

As 'big data' grows, IT job roles, technology must change

Lucas Mearian, Computerworld | May 6, 2011
LAS VEGAS -- As corporate data stores continue to grow, in some cases by more than 50% annually, the expanding task of managing and mining them for information is forcing a change in how IT workers are trained.

EMC announced here that it has ramped up programs to train and certify "data scientists." A data scientist spends his or her time determining the value of a corporation's data.

Nick Mehta, CEO of cloud storage provider LiveOffice, said data persists whether it's properly stored or not.

"For us, the issue is, how do you enable a world where you can keep everything cost effectively? We want a way to keep everything and then make it valuable. Having all that data helps us do our jobs better," he said.

LiveOffice currently stores some 4 petabytes of data on disk and adds another 5TB to that pool each day. LiveOffice encrypts all of the data for customer safety.

LiveOffice uses data analytics tools, such as MapReduce technology like Hadoop and distributed databases like Cassandra, to mine massive data stores on Isilon arrays. It's a way to search data for legal discovery and regulatory compliance requests, as well as insight into customers' habits.

Stephen Martino, director of production operations at Harvard Medical School, said the time is coming when there will be a demand from corporate users for mining services.

What IT managers need is a way to track who is using what, and that is ostensibly still missing from tools vendors provide, he said.

"A researcher has no boundaries on how much they can store, even 1TB to 2TB per day. I think the biggest struggle we have is you need to gather data that spells out who in the research lab is consuming data for chargeback," he said.

Paul English, director of IT at 3Tier, which provides extensive weather data to renewable energy companies, said his IT staff had been spending hours a day in meetings to figure out where data goes and who is responsible for managing it. "We've never not been dealing with big data," he said. "We want to keep 10 or 20 years of climatological data. We have growth potential of many petabytes."

To address the data deluge, his company installed 14 Isilon NAS arrays to create an expandable pool, accessible by anyone in his company.

"Now [capacity is] delivered more as a utility, he said.

One continuing issue, the IT managers said, is data movement -- migrating it to the correctly priced storage tier and keeping it as close as possible to the people using it.

"You're talking terabytes per day that you can never keep up with on the operations side," Martino said. "You can never get that data from one site to another."

 

Previous Page  1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.