We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reas...
The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that...
Naive Bayes and logistic regression perform well in different regimes. While the former is a very simple generative model which is efficient to train and performs well empirically...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
Abstract— This paper discusses the generation of informationrich, arbitrarily-large synthetic data sets which can be used to (a) efficiently learn tests that correlate a set of ...
Haralampos-G. D. Stratigopoulos, Salvador Mir, Yio...