Study: It's Not Hard to Connect Anonymous Data to Specific Individuals
Researchers from Université catholique de Louvain in Belgium and Imperial College London have debunked the notion that data can be anonymized as promised by tech companies.
"Using machine learning, the researchers developed a system to estimate the likelihood that a specific person could be re-identified from an anonymized data set containing demographic characteristics," according to an article by Nick Wells and Leslie Picker. "The researchers’ model suggests that over 99% of Americans could be correctly re-identified from any dataset using 15 demographic attributes, including age, gender and marital status."
The research was published in the journal Nature Communications, and as part of the effort, the researchers, "published an online tool to help people understand how likely it is for them to be re-identified, based on just three common demographic characteristics: gender, birth date and ZIP code."
A quote from Yves-Alexandre de Montjoye, one of the researchers, sums up the problem inherent to the study's findings, and the implications for fields like planning, where big data has promised large benefits to society: "The goal of anonymization is so we can use data to benefit society," said Montjoye. "This is extremely important but should not and does not have to happen at the expense of people’s privacy."