Sciweavers

KDD
2007
ACM

Privacy-Preserving Sharing of Horizontally-Distributed Private Data for Constructing Accurate Classifiers

14 years 5 months ago
Privacy-Preserving Sharing of Horizontally-Distributed Private Data for Constructing Accurate Classifiers
Data mining tasks such as supervised classification can often benefit from a large training dataset. However, in many application domains, privacy concerns can hinder the construction of an accurate classifier by combining datasets from multiple sites. In this work, we propose a novel privacypreserving distributed data sanitization algorithm that randomizes the private data at each site independently before the data is pooled to form a classifier at a centralized site. Distance-preserving perturbation approaches have been proposed by other researchers but we show that they can be susceptible to security risks. To enhance security, we require a unique non-distance-preserving approach. We use Kernel Density Estimation (KDE) Resampling, where samples are drawn independently from a distribution that is approximately equal to the original data's distribution. KDE Resampling provides consistent density estimates with randomized samples that are asymptotically independent of the origina...
Vincent Yan Fu Tan, See-Kiong Ng
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2007
Where KDD
Authors Vincent Yan Fu Tan, See-Kiong Ng
Comments (0)