Sciweavers

IJCAI
2001

Combining Statistics and Semantics for Word and Document Clustering

13 years 5 months ago
Combining Statistics and Semantics for Word and Document Clustering
A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet. Sense Units are used to recode documents and are evaluated from the performance increase they permit in classification tasks. Experimental results show that accounting for semantic information in fact decreases the performances compared to LSI standalone. The main weakenesses of the current hybrid scheme are discussed and several tracks for improvement are sketched.
Alexandre Termier, Michèle Sebag, Marie-Chr
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where IJCAI
Authors Alexandre Termier, Michèle Sebag, Marie-Christine Rousset
Comments (0)