I've become a big fan of metrics. I wasn't always, but throughout my career in information security, I've had bosses who have challenged me on metrics, and I have honed my skills so that now I feel the metrics I collect meet the "SMART" test: specific, meaningful, actionable, repeatable and time-dependent.
For example, once a quarter I report on the patch and antivirus compliance of our DMZ and production infrastructure. I also report on the number of unmanaged resources that are discovered. These metrics have never been challenged, as they are very specific, they are a measurement of risk (meaningful), there are clear actions to take (actionable), the method to collect and report each quarter are clear (repeatable) and they measure risk over time (time-dependent). Some metrics I usually report on once or twice a year. For example I like to report on the amount of security budget spent per employee, the number of security head count as a percentage of IT, and the percentage of security budget as a percentage of the IT budget. I then compare those numbers against my peers and other industry analyst benchmarks (such as Gartner). I don't have many metrics, but the ones I have tell a good story and represent the security health and risk of the enterprise.
Now that our security operations center is up and running, I wanted to create some additional and of course meaningful metrics that measure the effectiveness of that function. As I mentioned in my previous installment, one of the problems we're currently having with our outsourced operations center is the high level of false positives related to malware incidents. To avoid a high degree of response to false positives and to avoid stress between the security department and the help desk, I have directed the Level 1 analysts to forward malware incidents to a Level 3 analyst for verification. If the Level 3 analyst determines that a malware event is a false positive, he annotates it and trains the Level 1 analyst accordingly. Until the number of false positives is driven down to a manageable level, only Level 3 analysts can open a malware ticket to have an incident acted on. I mention all of this because it's one of the items that I am now measuring (more on that below).
Our company's help desk ticketing system is not sophisticated enough to track the details that I am interested in collecting relating to security incidents. Therefore, I use a Microsoft SharePoint list to capture various elements of a security incident. For example, I have our analysts annotate how the incident was detected which includes our SIEM, endpoint protection agent, employee, customer, law enforcement or other third-party agency. If the incident is related to malware (as most of our incidents are), I capture the department the user belongs to. If the incident results in a false positive finding, I have the analysts mark the incident as a false positive. I capture whether the incident was a result of a phishing attack or the installation of an application. I capture whether the malware is categorized as a rootkit, Trojan, Info-Stealer, browser helper object (BHO) or some other potentially unwanted program (PUP). I also track when the incident was detected and when the analyst actually started working on it. And I capture these, as well as many other aspects of an incident, so that I can, over time, track and trend around things like which departments tend to be the source of malware infections so that I can concentrate awareness training.
Sign up for CIO Asia eNewsletters.