Sciweavers

SDM
2007
SIAM

Bursty Feature Representation for Clustering Text Streams

13 years 5 months ago
Bursty Feature Representation for Clustering Text Streams
Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bagof-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks invol...
Qi He, Kuiyu Chang, Ee-Peng Lim, Jun Zhang
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2007
Where SDM
Authors Qi He, Kuiyu Chang, Ee-Peng Lim, Jun Zhang
Comments (0)