Sciweavers

SAC
2009
ACM

Combining statistics and semantics via ensemble model for document clustering

13 years 11 months ago
Combining statistics and semantics via ensemble model for document clustering
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets. Keywords WordNet, ensemble learning, text clustering, disambiguation.
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where SAC
Authors Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
Comments (0)