Abstract. As the type of content available on the web is becoming increasingly diverse, a particular challenge is to properly determine the types of documents sought by a user, tha...
Shanu Sushmita, Benjamin Piwowarski, Mounia Lalmas
This paper pursues the recently emerging paradigm of searching for entities that are embedded in Web pages. We utilize informationextraction techniques to identify entity candidat...
Julia Stoyanovich, Srikanta J. Bedathur, Klaus Ber...
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of rev...
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
Finding a set of web pages relevant to a user’s information goal is difficult due to the enormous size of the Internet. Search engines are able to find a set of pages that mat...