We describe a framework of bootstrapped hypothesis testing for estimating the confidence in one web search engine outperforming another over any randomly sampled query set of a gi...
Eric C. Jensen, Steven M. Beitzel, Ophir Frieder, ...
We describe a Markov chain Monte Carlo (MCMC)-based algorithm for sampling solutions to mixed Boolean/integer constraint problems. The focus of this work differs in two points from...
Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response ...
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the e...