This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated la...
Victor S. Sheng, Foster J. Provost, Panagiotis G. ...
We present a generalization of frequent itemsets allowing the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifie...
With the increased abilities for automated data collection made possible by modern technology, the typical sizes of data collections have continued to grow in recent years. In suc...
Naïve Bayes is a well-known effective and efficient classification algorithm, but its probability estimation performance is poor. Averaged One-Dependence Estimators, simply AODE,...
Selectivity estimation of queries is an important and wellstudied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geogra...