Several IR tasks rely, to achieve high efficiency, on a single pervasive data structure called the inverted index. This is a mapping from the terms in a text collection to the docu...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Topic Detection and Tracking is an event-based information organization task where online news streams are monitored in order to spot new unreported events and link documents with ...
Juha Makkonen, Helena Ahonen-Myka, Marko Salmenkiv...
The domain of Digital Libraries presents specific challenges for unsupervised information extraction to support both the automatic classification of documents and the enhancement ...
Mikalai Krapivin, Maurizio Marchese, Andrei Yadran...
In traditional peer-to-peer search networks, operations focus on properly labeled files such as music or video, and the actual search is often limited to text tags. The explosive ...