Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
We conduct large-scale search engine relevance experiments, using the 12% of queries that contain placenames, matching the placenames to places in the documents, and examining the...
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector spa...
Realtime web search refers to the retrieval of very fresh content which is in high demand. An effective portal web search engine must support a variety of search needs, including ...