Concept Discovery from Text

10 years 2 months ago
Concept Discovery from Text
Broad-coverage lexical resources such as WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.
Dekang Lin, Patrick Pantel
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Authors Dekang Lin, Patrick Pantel
Comments (0)