In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
A novel method for simultaneous keyphrase extraction and generic text summarization is proposed by modeling text documents as weighted undirected and weighted bipartite graphs. Sp...
Several IR tasks rely, to achieve high efficiency, on a single pervasive data structure called the inverted index. This is a mapping from the terms in a text collection to the docu...
A distributed information retrieval system with resourceselection and resultset merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million...
Christopher T. Fallen, Gregory B. Newby, Kylie McC...
Abstract. We present PlanetP, a peer-to-peer (P2P) content search and retrieval infrastructure targeting communities wishing to share large sets of text documents. P2P computing is...
Francisco Matias Cuenca-Acuna, Christopher Peery, ...