Sciweavers

COLING
2002

Concept Discovery from Text

13 years 4 months ago
Concept Discovery from Text
Broad-coverage lexical resources such as WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.
Dekang Lin, Patrick Pantel
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Where COLING
Authors Dekang Lin, Patrick Pantel
Comments (0)