A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This c...
David Hawking, Nick Craswell, Paul B. Thistlewaite...
XML documents represent a middle range between unstructured data such as textual documents and fully structured data encoded in databases. Typically, information retrieval techniq...
Yosi Mass, Dafna Sheinwald, Benjamin Sznajder, Siv...
Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, ...
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on th...
We describe Eureka, a file system search engine that takes into account the inherent relationships among files in order to improve the rankings of search results. The key idea b...