Sciweavers

ICDE
2007
IEEE

Indexing Uncertain Categorical Data

14 years 5 months ago
Indexing Uncertain Categorical Data
Uncertainty in categorical data is commonplace in many applications, including data cleaning, database integration, and biological annotation. In such domains, the correct value of an attribute is often unknown, but may be selected from a reasonable number of alternatives. Current database management systems do not provide a convenient means for representing or manipulating this type of uncertainty. In this paper we extend traditional systems to explicitly handle uncertainty in data values. We propose two index structures for efficiently searching uncertain categorical data, one based on the R-tree and another based on an inverted index structure. Using these structures, we provide a detailed description of the probabilistic equality queries they support. Experimental results using real and synthetic datasets demonstrate how these index structures can effectively improve the performance of queries through the use of internal probabilistic information.
Sarvjeet Singh, Chris Mayfield, Sunil Prabhakar, R
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2007
Where ICDE
Authors Sarvjeet Singh, Chris Mayfield, Sunil Prabhakar, Rahul Shah, Susanne E. Hambrusch
Comments (0)