In Latent Semantic Indexing (LSI), a collection of documents is often pre-processed to form a sparse term-document matrix, followed by a computation of a low-rank approximation to...
Word segmentation is a crucial step for segmentation-free document analysis systems and is used for creating an index based on word matching. In this paper, we propose a novel met...
In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a ...
Marcus Fontoura, Vanja Josifovski, Ravi Kumar, Chr...
Documents often contain inherently many concepts reflecting specific and generic aspects. To automatically generate a short summary text of documents on similar topics, it is im...
Digital Libraries will hold huge amounts of text and other forms of information. For the collections to be maximally useful, they must be highly organized with useful indexes and ...
Robert P. Futrelle, Xiaolan Zhang 0002, Yumiko Sek...