Discovery of numerous specific topics via term co-occurrence analysis

9 years 8 months ago
Discovery of numerous specific topics via term co-occurrence analysis
We describe efficient techniques for construction of large term co-occurrence graphs, and investigate an application to the discovery of numerous fine-grained (specific) topics. A topic is a small dense subgraph discovered by a random walk initiated at a term (node) in the graph. We observe that the discovered topics are highly interpretable, and reveal the different meanings of terms in the corpus. We show the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in classification accuracy over the standard bag-of-words representation, even at high training proportions. We explain how a layered pyramidal view of the term distribution helps in understanding the algorithms and in visualizing and interpreting the topics. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval--Clustering General Terms Algorithms Keywords unsupervised learning, text mining, ...
Omid Madani, Jiye Yu
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where CIKM
Authors Omid Madani, Jiye Yu
Comments (0)