In Latent Semantic Indexing (LSI), a collection of documents is often pre-processed to form a sparse term-document matrix, followed by a computation of a low-rank approximation to...
Systems based on statistical and machine learning methods have been shown to be extremely effective and scalable for the analysis of large amount of textual data. However, in the r...
Abstract. Searching specialized collections, such as biomedical literature, typically requires intimate knowledge of a specialized terminology. Hence, it can be a disappointing exp...
Many of the available image databases have keyword annotations associated with the images. In spite of the availability of good quality low-level visual features that reflect wel...
Researchers in the data mining area frequently have to spend significant portion of their time on preprocessing the data in order to apply their algorithms to real-world datasets...
Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotr...