Aggregate queries on probabilistic record linkages

8 years 8 months ago
Aggregate queries on probabilistic record linkages
Record linkage analysis, which matches records referring to the same real world entities from different data sets, is an important task in data integration. Uncertainty often exists in record linkages due to incompleteness or ambiguity in data. Fortunately, the state-of-the-art probabilistic record linkage methods are capable of computing the probability that two records referring to the same entity. In this paper, we study the novel aggregate queries on probabilistic record linkages, such as counting the number of matched records. We address several fundamental issues. First, we advocate that the answer to an aggregate query on probabilistic record linkages is a probability distribution of possible answers derived from possible worlds. Second, we identify the category of compatible linkages only on which the answers to aggregate queries can be determined properly when the probabilities of individual linkages are available but the joint distributions of multiple linkages are unavaila...
Ming Hua, Jian Pei
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where EDBT
Authors Ming Hua, Jian Pei
Comments (0)