We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Query expansion techniques generally select new query terms from a set of top ranked documents. Although a user’s manual judgment of those documents would much help to select goo...
In this paper, we present our online summarization system of web topics. The user defines the topic by a set of keywords. Then the system searches the Web for the relevant documen...
Semantic similarity measurement is a key methodology in various domains ranging from cognitive science to geographic information retrieval on the Web. Meaningful notions of similar...
This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online. Most question answering systems use ...
Susan T. Dumais, Michele Banko, Eric Brill, Jimmy ...