—A huge portion of todays Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively ...
This paper describes an intelligent agent to facilitate bitext mining from the Web via automatic discovery of URL pairing patterns (or keys) for retrieving parallel web pages. The...
We present a new approach for personalizing Web search results to a specific user. Ranking functions for Web search engines are typically trained by machine learning algorithms u...
David Sontag, Kevyn Collins-Thompson, Paul N. Benn...
The traditional crawlers used by search engines to build their collection of Web pages frequently gather unmodified pages that already exist in their collection. This creates unne...
The world wide web has a wealth of information that is related to almost any text classification task. This paper presents a method for mining the web to improve text classificati...