Sciweavers

WWW
2007
ACM

On anonymizing query logs via token-based hashing

14 years 4 months ago
On anonymizing query logs via token-based hashing
In this paper we study the privacy preservation properties of a specific technique for query log anonymization: tokenbased hashing. In this approach, each query is tokenized, and then a secure hash function is applied to each token. We show that statistical techniques may be applied to partially compromise the anonymization. We then analyze the specific risks that arise from these partial compromises, focused on revelation of identity from unambiguous names, addresses, and so forth, and the revelation of facts associated with an identity that are deemed to be highly sensitive. Our goal in this work is twofold: to show that token-based hashing is unsuitable for anonymization, and to present a concrete analysis of specific techniques that may be effective in breaching privacy, against which other anonymization schemes should be measured. Categories and Subject Descriptors H.3.m [Information Storage and Retrieval]: Miscellaneous General Terms Algorithms, Experimentation, Measurements Key...
Ravi Kumar, Jasmine Novak, Bo Pang, Andrew Tomkins
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Ravi Kumar, Jasmine Novak, Bo Pang, Andrew Tomkins
Comments (0)