But by 2008, Zions hit a wall with SIEM. The data supply had become too big and complex to handle. It was now taking months and even years to piece together an actionable picture. The sheer force of data accumulation and the frequency of analysis of events had simply overwhelmed SIEM.
"It's not that SIEM was obsolete and needed to be replaced with something else," Wood says. "It's that we needed something to augment SIEM. It was great for telling the data what to do, but it couldn't tell us what to do."
The Problem of Scale
The team went looking for the missing piece of the puzzle and soon found it in Hadoop.
Open-source Hadoop technology is the engine that drives many of today's more successful big-data security programs. Companies use it to gather, share and analyze massive amounts of structured and unstructured data flowing through their networks. Wood swears by it.
"Now, SIEM is for some data sources just a feed into the security data warehouse," Wood says. Hadoop became the central ingredient in building that warehouse. The company began moving to Hadoop in 2010. Within a year, the team was using the platform exclusively. The positive results came fast and furious. Since Zions' myriad security tools and devices produce several terabytes of data per week, loading a day of logs into the system would be a daylong process. Now it's almost happening in real time.
That's crucial in a world where the bad guys have developed speedy methods of attacking company data and networks. Hadoop can process well over a hundred data sources at a time, uncovering pings on the perimeter, malware infecting parts of the network, social engineering attempts such as spear phishing, and more.
For many companies, Hadoop has also made big-data security affordable, according to Adrian Lane, CTO and security analyst at Securosis. "The cloud has made big data more accessible and affordable. Free tools like Hadoop have been a significant driver. It always comes down to money--what's cheaper," he says.
How Hadoop Works
The Apache Hadoop site describes the technology as "a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models." It's designed to scale up from single servers to thousands of machines, each offering local computation and storage. "Rather than rely on hardware to deliver high availability, the library itself is designed to detect and handle failures at the application layer, delivering a highly available service on top of a cluster of computers, each of which may be prone to failures."
Hadoop includes the following modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Sign up for CIO Asia eNewsletters.