Combining statistics and semantics via ensemble model for document clustering

16 years 4 days ago

Download www.h-net.org

Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets. Keywords WordNet, ensemble learning, text clustering, disambiguation.

Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan

Real-time Traffic

Applied Computing | Data Mining Algorithms | Data Sets | Ensemble Model | SAC 2009 |

claim paper

Related Content

» GaP a factor model for discrete data

» CLUSEQ Efficient and Effective Sequence Clustering

» Performance and Implications of Semantic Indexing in a Distributed Environment

Post Info
More Details (n/a)

Added	19 May 2010
Updated	19 May 2010
Type	Conference
Year	2009
Where	SAC
Authors	Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan

Comments (0)

Sciweavers

Combining statistics and semantics via ensemble model for document clustering

Applied Computing | Data Mining Algorithms | Data Sets | Ensemble Model | SAC 2009 |

Explore & Download

Productivity Tools

Sciweavers