And because people are bad at remembering passwords, those attacks don’t just expose passwords for Microsoft systems. “If someone gets your credentials, we’ll see it; we’re able to see breached credentials very early. If one of your employees has reused their work credentials to set up an account on a shopping site and that’s been breached, the bad guys tend to test it against us to see if it works,” Weinert explains. “At that point, before those credentials are ever tried against your company, we can say that the credentials have gone bad and you should go protect that person.”
A method to the madness
Microsoft also gets to see the methods attackers are using. “We see where attacks are coming from at a very nuanced level, and what attacks are shaped like, in both the consumer and enterprise space,” says Weinert. “The adaptability of the bad guys means that the things that mattered yesterday may not matter today. And no-one in the enterprise space has the volume we have [to learn from].”
That volume is tens of terabytes a day and 13 billion login transactions, which are fodder for Microsoft’s machine learning systems to stay up to date on the latest attacks. A deluge of data is only part of what you need to build a system like this. According to Weinert, “a relatively sophisticated and well trained machine learning system takes years, and you also need some expert level human supervision to look as see if there is anything the system isn’t catching.”
That matters because this is about more than spotting patterns and warning you later. As Weinert points out, “the goal is protection, not remediation. A lot of machine learning systems detect what’s happened. Our primary goal is to stop attacks getting through, so we’re training our protections systems. Every day we learn the nuances of the newest attack patterns … and we use the system to generate code on our front end servers that scores everything that comes through.” That score uses around a hundred factors, from browser user agent strings to the time of day.
A low score means a login getting blocked or turning on multi-factor authentication for that account. You might see false positives, with legitimate users being challenged, but Weinert believes that’s less likely than with traditional systems built on theories about behavior that might block your account because the desktop PC you’ve left on in the office is still connected (and might be writing files when it does a backup) because the system can learn that you’re travelling and logging on from another PC in another location.
It’s not just the scale of data that makes a difference, he suggests. “As humans, we want to believe our hunch is right, we get very attached to our theories, but machine learning doesn’t care. Even if something is a strong signal today, if that fades out of fashion the system is completely willing to throw that away and pick out a new pattern. It adapts to the reality of what actually results in a compromise, not our suspicions or our suppositions. As a result, our precision – which is the number of times when we’re targeting someone that they’re is actually a bad guy – is very high.”
Sign up for CIO Asia eNewsletters.