Sciweavers

KDD
2008
ACM

The cost of privacy: destruction of data-mining utility in anonymized data publishing

14 years 4 months ago
The cost of privacy: destruction of data-mining utility in anonymized data publishing
Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasiidentifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, kanonymity requires that each "quasi-identifier" tuple appear in at least k records, while -diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier. In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that kanonymous databases can be useful for data mining, but k-anonymization does not guarantee any ...
Justin Brickell, Vitaly Shmatikov
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2008
Where KDD
Authors Justin Brickell, Vitaly Shmatikov
Comments (0)