In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient min...
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search ...
Sofus A. Macskassy, Arunava Banerjee, Brian D. Dav...
We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through...
Adriano Veloso, Wagner Meira Jr., Marco Cristo, Ma...
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...