Several algorithms have been proposed for finding the “best,” “optimal,” or “most interesting” rule(s) in a database according to a variety of metrics including confid...
The problem of data cleaning, which consists of removing inconsistencies and errors from original data sets, is well known in the area of decision support systems and data warehou...
Helena Galhardas, Daniela Florescu, Dennis Shasha,...
In this paper, we present an overview of generalized expectation criteria (GE), a simple, robust, scalable method for semi-supervised training using weakly-labeled data. GE fits m...
Maintaining materialized views that have join conditions between arbitrary pairs of data sources possibly with cycles is critical for many applications. In this work, we model vie...
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximate...