"In the real world, people aren't going to do that all the time," she said. "To actually get true anonymity in big data, you have to go to an extraordinarily broad aggregate level.
"If you're talking just about data collected for statewide or citywide trends, then it can be de-identified because it's not talking about individuals. But if you're talking how many had the flu in Boston, and any kind of ZIP code data is available, that's different," she said.
Joseph Lorenzo Hall, chief technologist at the Center for Democracy & Technology, agrees that while rigorous de-identification is demonstrably effective, the world of data collection does not always meet the ideal. One reason for that, he said, is that truly impregnable de-identification makes data much less useful.
"The essential feature of these sets of data that make re-identification feasible is that records of behavior from the same individual are linked to one another," he said. "That's a big part of the benefit for keeping these records.
"The big problem is public release of data sets that have been poorly anonymized and sharing between private parties data sets that they consider to not contain personal information, when they definitely contain some sort of persistent identifier that could be trivially associated with an individual."
And while clearly some data collection is aimed at the economic well being or health of people, Hall notes that plenty more is not. "Many retail establishments use Wi-Fi tracking that uses your device's MAC address (a persistent network identifier) to track you through the store," he said.
"This is why Apple has begun randomizing these addresses as announced to the network."
Paul O'Neil, senior information security adviser at IDT911 Consulting, has much the same view. "If de-identification is done properly, then yes, it can work," he said. "But that is a much bigger 'if' than most people realize."
Raul Ortega, head of global presales at Protegrity, also notes how uneven the protection of data is. "Credit card protection is improving, while there is very little being done to de-identify the hordes of PII data that exist in every company," he said.
Part of the problem, say legal experts, may be one of semantics, which leads to public confusion. "We need to be clear what we mean when we call data anonymous," said Kelsey Finch, policy counsel at the Future of Privacy Forum (FPF).
She said only data that has both direct and indirect identifiers removed should be called "anonymous," while data that still has indirect identifiers should be termed, "pseudonymous."
"Very often, advertising companies that track and profile users' cookies or mobile device identifiers call that data anonymous," she said. "However, these same data are often considered personal by privacy advocates, because they can be linked over time to an individual."
Sign up for CIO Asia eNewsletters.