The deluge of unstructured data continues to grow as well. "The tough challenge, which everyone is trying to solve, is unstructured data that's coming off documents that you wouldn't have expected to have to mine for information," says Vince Campisi, CIO at GE Software, a unit launched in 2011 that connects machines, big data and people to facilitate data analysis. "The traditional BI principles in concept and form still hold true, but the intensity of how much information is coming at you is much higher than the daily transactions in systems running your business."
How do you build a data storage strategy in the era of big data, scale your storage architecture to keep pace with data and business growth, and keep storage costs under control? Find out from big data veterans who share their storage sagas and explain how they have reinvented their storage strategies.
Lower-End Storage Does the Trick
In close political races, data can make a difference. Just ask the folks at Catalist. A Washington-based political consultancy, Catalist stores and mines data on 190 million registered voters and 90 million unregistered voters — including almost a billion "observations" of people based on pubic records such as real estate transactions or requests for credit reports. The information produced from its analytics tools tells campaign organizers whose door to knock on and can even prompt candidates to change their voter strategies overnight.
"We used to have a big EMC storage system that we retired a while back just because it was so expensive and consumed so much power," says Catalist CTO Jeff Crigler, noting that the EMC system also ran out of space. So the firm built a cluster of NAS servers that each hold about a petabyte of data. "It's essentially a big box of disks with a processor that's smart enough to make it act like an EMC-like solution" with high-density disk drives, some "fancy" configuration software and very modest CPU to run the configuration software.
Csaplar sees a growing trend away from expensive storage boxes that can cost more than $100,000 and toward lower-cost servers that are now capable of doing more work. "As servers get more powerful," he says, "they take over some of the work that you used to have specialized appliances do." It's similar to the way networking has evolved from network-attached hubs to a NIC card on the back of the server to functionality residing on silicon as part of the CPU, he adds.
"I believe that storage is moving this way as well," says Csaplar. Instead of buying big expensive storage arrays, he says, companies are taking the JBOD (just a bunch of disks) approach — using nonintelligent devices for storage and using the compute capacity of the servers to manage it. "This lowers the overall cost of the storage, and you don't really lose any functionality — or maybe it does 80% of the job at 20% of the cost," he notes.
Sign up for CIO Asia eNewsletters.