A large amount of empirically derived world knowledge is essential for many languageprocessing tasks, to create expectations that can help assess plausibility and guide disambigua...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
In this paper, we present our online summarization system of web topics. The user defines the topic by a set of keywords. Then the system searches the Web for the relevant documen...
Semantic similarity measurement is a key methodology in various domains ranging from cognitive science to geographic information retrieval on the Web. Meaningful notions of similar...
This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online. Most question answering systems use ...
Susan T. Dumais, Michele Banko, Eric Brill, Jimmy ...