.This emerging picture should inform your collection efforts - you might need to obtain a new information source to follow up a lead from an earlier analysis, or to discard an information source (and the cost of collecting and analysing it) once you realise it's not helping.
- More data is always good. The case for accumulating more data - Big Data - is strong: not only does it bring deeper insights, it also can reduce your compute workload - Jeff's experience shows that the length of time it takes to link a new observation into a large information network actually goes down as the total number of observations goes up, beyond a certain threshold.
One of the most interesting new sources of Big Data insights is data about the interactions of people with systems - even their mistakes! That's how Google knows to ask "did you mean this?"
- Can you count? Good! Accurate counting of entities (people, cases, transactions), a.k.a. Entity Resolution, is critical to deeper analysis - if you can't count, you can't determine a vector or velocity, and without those, you can't make predictions. Many interesting analyses in fraud detection involve detecting identities - accurately counting people, knowing when two identities are the same person, or when one identity is actually being used by more than one person, or even when an identity is not a real person at all... Identity matching is also the source of analyses that identify dead people voting and other such fraud.
- Privacy matters, but it's not an obstacle. Once identity comes into play, then privacy concerns (and regulations) must of course be taken into consideration. There are advanced techniques such as one-way hashes that can be used to anonymise a dataset without reducing its usefulness for analytical purposes.
- Bad guys can be smart, too. Skilled adversaries present unique problems, but they can be overcome: to catch them, you must collect observations the adversary doesn't know you have (e.g. a camera on a route, that they don't know you have), or compute over your observations in a way the adversary can't imagine (e.g. recognising faces or license plates, and correlating that with other location information).
So as adversaries get smarter and more capable of avoiding detection all the time, savvy analysts must continually push the edge of the envelope of applying new techniques and technology to the game.
How To Stay Ahead Of The Game
Jeff pointed out that location data presents tantalising new possibilities for insight. There are 600 billion location records created every day in the US alone! This data is being routinely de-identified and shared with multiple third parties, in volume and in real time, and it's amazing what you can figure out from it. Consider the example of Malte Spitz, who as an act of political protest over his privacy concerns sued Deutsche Telekom for release of his location records. They revealed that over six months, he "hung out" 2400 times at 130 unique places. Know three of those locations - home (sleeps at night), work (goes in the daytime), and pub (goes to meet friends - links to other trails of location data) and I can tell you who the person is, despite the anonymised data - and who his friends are.
Sign up for CIO Asia eNewsletters.