Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
For centuries, the deep connection between languages has brought about major discoveries about human communication. In this paper we investigate how this powerful source of inform...
A pervasive problem in large relational databases is identity uncertainty which occurs when multiple entries in a database refer to the same underlying entity in the world. Relati...
We present a distributed machine learning framework based on support vector machines that allows classification problems to be solved iteratively through parallel update algorithm...
Recent research in automated learning has focused on algorithms that learn from a combination of tagged and untagged data. Such algorithms can be referred to as semi-supervised in...