Sciweavers

PAKDD
2009
ACM

Accurate Synthetic Generation of Realistic Personal Information

13 years 9 months ago
Accurate Synthetic Generation of Realistic Personal Information
A large proportion of the massive amounts of data that are being collected by many organisations today is about people, and often contains identifying information like names, addresses, dates of birth, or social security numbers. Privacy and confidentiality are of great concern when such data is being processed and analysed, and when there is a need to share such data between organisations or make it publicly available. The research area of data linkage is especially suffering from a lack of publicly available real-world data sets, as experimental evaluations and comparisons are difficult to conduct without real data. In order to overcome this problem, we have developed a data generator that allows flexible creation of synthetic data with realistic characteristics, such as frequency distributions and error probabilities. Our data generator significantly improves similar earlier approaches, and allows the creation of data containing records for individuals, households and families.
Peter Christen, Agus Pudjijono
Added 26 Jul 2010
Updated 26 Jul 2010
Type Conference
Year 2009
Where PAKDD
Authors Peter Christen, Agus Pudjijono
Comments (0)