Multi-label informed latent semantic indexing

10 years 12 months ago
Multi-label informed latent semantic indexing
Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with multiple labels, in which each document can belong to several categories simultaneously. In this paper we introduce the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered “latent semantics” thus incorporate the human-annotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters21578 and RCV1, demonstrates very encouraging results. Categories and Subject Descriptors H.3 [Information Storage and Re...
Kai Yu, Shipeng Yu, Volker Tresp
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Authors Kai Yu, Shipeng Yu, Volker Tresp
Comments (0)