A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...
Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...
Abstract. This paper proposes the use of Latent Semantic Indexing (LSI) techniques, decomposed with semi-discrete matrix decomposition (SDD) method, for text categorization. The SD...
Previous work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (call...
Xuanhui Wang, ChengXiang Zhai, Xiao Hu, Richard Sp...
In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a ...