In this paper, we describe the Travelers in the Middle East Archive (TIMEA), a digital archive focused on Western explorations in the Middle East between the 18th and early 20th c...
Lisa M. Spiro, Marie Wise, Geneva L. Henry, Chuck ...
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid...
Abstract. There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controll...
Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not av...