Distributed systems are inherently complex, and therefore difficult to design and develop. Experience shows that new technologies—such as components, aspects, and application f...
Text mining, though still a nascent industry, has been growing quickly along with the awareness of the importance of unstructured data in business analytics, customer retention an...
Abstract We address the problem of indexing broadcast audiovisual documents (such as films, news). Starting from a collection of so-called shots, we aim at building automatically h...
Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes t...
Philip Bohannon, Michael Flaster, Wenfei Fan, Raje...
Duplicate detection is the problem of detecting different entries in a data source representing the same real-world entity. While research abounds in the realm of duplicate detect...