Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Big data collection makes it hard for you to remain anonymous

Taylor Armerding | March 31, 2015
Effective techniques exist to “de-identify” personal information in Big Data collection. But what really matters is how often it is applied. And most experts say, that's not very often.

faceless

How anonymous is "anonymous" in today's digital world?

Not the hacktivist collective -- this is about how anonymous average people are when the data they generate is vacuumed up by everybody from marketers to websites, law enforcement, researchers, government and more.

Is Big Data collection, even with personally identifiable information (PII) stripped out or encrypted, still vulnerable to "re-identification" techniques that pinpoint individuals to the point that intrusive surveillance is possible, or already going on?

Or can "de-identification" leave individuals comfortably faceless in an ocean of data that is just being used to spot trends, track the spread of disease, specify high-crime areas or other things that will improve the economic well being or health of a population?

Don't expect a unanimous answer from IT and privacy experts. The debate about it is ongoing.

Among those on one side are the authors of a June 2014 white paper sponsored by the Information and Privacy Commissioner (IPC) of Ontario, Canada and the Information Technology & Innovation Foundation (ITIF) titled "Big Data and Innovation, Setting the Record Straight: De-identification Does Work," who argue that privacy advocates and their media enablers should chill out.

Lead authors Daniel Castro and Ann Cavoukian decry what they call, "misleading headlines and pronouncements in the media," that they say suggest that those with even a moderate amount of expertise and the right technology tools can expose those whose data have been anonymized.

The fault for the spread of this "myth," they say, is not with findings presented by researchers in primary literature, but "a tendency on the part of commentators on that literature to overstate the findings."

They contend that de-identification, done correctly, is close to bulletproof, reducing the chance of a person being identified to less than 1% -- far less than the risk of simply taking out trash containing documents that might have PII in them.

They also argue that unwarranted fear of a loss of anonymity may undermine, "advancements in data analytics (that) are unlocking opportunities to use de-identified datasets in ways never before possible ... (to) create substantial social and economic benefits."

But they do acknowledge that, to be effective, "creating anonymized datasets requires statistical rigor, and should not be done in a perfunctory manner."

And that, according to Pam Dixon, executive director of the World Privacy Forum (WPF), is the problem. She and others contend that outside of the controlled environment of academic research, both anonymity and privacy are essentially dead.

Dixon doesn't quarrel with the white paper's contention that de-identification can be effective, but said that "in the wild," not all datasets are going to be rigorously anonymized.

 

1  2  3  4  Next Page 

Sign up for CIO Asia eNewsletters.