Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query...
In many application domains there is a large amount of unlabeled data but only a very limited amount of labeled training data. One general approach that has been explored for util...
Avrim Blum, John D. Lafferty, Mugizi Robert Rweban...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
We address the problem of auditing an election when precincts may have different sizes. Prior work in this field has emphasized the simpler case when all precincts have the same s...
In this work we provide a new methodology for comparing regression functions m1 and m2 from two samples. Since apart from smoothness no other (parametric) assumptions are required...