Entity matching (EM) is the task of identifying records that refer to the same real-world entity from different data sources. While EM is widely used in data integration and data...
We introduce a new EM framework in which it is possible not only to optimize the model parameters but also the number of model components. A key feature of our approach is that we...
Robust clustering of data into overlapping linear subspaces is a common problem. Here we consider one-dimensional subspaces that cross the origin. This problem arises in blind sour...
Probabilistic Decision Graphs (PDGs) are a class of graphical models that can naturally encode some context specific independencies that cannot always be efficiently captured by...
We show that, given data from a mixture of k well-separated spherical Gaussians in Rd, a simple two-round variant of EM will, with high probability, learn the parameters of the Ga...