The quest to nd models usefully characterizing data is a process central to the scienti c method, and has been carried out on many fronts. Researchers from an expanding number of ...
Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in...
Abstract. In this paper, we present an extensive study of the cuttingplane algorithm (CPA) applied to structural kernels for advanced text classification on large datasets. In par...
In biomedical articles, terms often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of spec...
In this paper we address the problem of identifying a broad range of term variations in Japanese web search queries, where these variations pose a particularly thorny problem due ...