Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowled...
This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use ...
We present a model, based on the maximum entropy method, for analyzing various measures of retrieval performance such as average precision, R-precision, and precision-at-cutoffs....
This paper proposes a method for creating a high quality collection of researchers’ homepages. The proposed method consists of three phases: rough filtering of the possible web p...
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...