Hierarchy-Regularized Latent Semantic Indexing

9 years 5 months ago
Hierarchy-Regularized Latent Semantic Indexing
Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge management. Beside textual features, the hierarchical structure of directories reflects additional and important knowledge annotated by experts. It is generally desired to incorporate this information into text mining processes. In this paper, we propose hierarchy-regularized latent semantic indexing, which encodes the hierarchy into a similarity graph of documents and then formulates an optimization problem mapping each document into a low dimensional vector space. The new feature space preserves the intrinsic structure of the original taxonomy and thus provides a meaningful basis for various learning tasks like visualization and classification. Our approach employs the information about class proximity and class specificity, and can naturally cope with multi-labeled documents. Our empirical studies show very encouraging results on two real-world data sets, the new Reuters (RCV1) benchmark ...
Yi Huang, Kai Yu, Matthias Schubert, Shipeng Yu, V
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where ICDM
Authors Yi Huang, Kai Yu, Matthias Schubert, Shipeng Yu, Volker Tresp, Hans-Peter Kriegel
Comments (0)