In Latent Semantic Indexing (LSI), a collection of documents is often pre-processed to form a sparse term-document matrix, followed by a computation of a low-rank approximation to...
This paper introduces a new approach to add fault-tolerance to a fulltext retrieval system. The weighted pattern morphing technique circumvents some of the disadvantages of the wid...
The approximate string matching problem is to find all locations at which a query of length m matches a substring of a text of length n with k-or-fewer differences. Simple and prac...
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...
1 The latent semantic indexing (LSI) methodology for information retrieval applies the singular value decomposition to identify an eigensystem for a large matrix, in which cells re...