Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

12

COLING
2002

favoriteEmaildiscussreport

147views Computational Linguistics» more COLING 2002»

Concept Discovery from Text

13 years 4 months ago

Concept Discovery from Text

Download acl.ldc.upenn.edu

Broad-coverage lexical resources such as WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.

Dekang Lin, Patrick Pantel

Real-time Traffic

Broad-coverage Lexical Resources | Cluster Quality | Clustering Algorithms | COLING 2002 | COLING 2008 |

claim paper

Related Content

» Mining concept associations for knowledge discovery in large textual databases

» Automatic Extraction of Semantic Networks from Text using Leximancer

» Learning the Semantic Meaning of a Concept from the Web

» Domain Ontology Construction from Biomedical Text

» Textractor A Framework for Extracting Relevant Domain Concepts from Irregular Corporate Te...

» Commentarybased video categorization and concept discovery

» A ConceptLink Graph for Text Structure Mining

» Knowledge Discovery in Textual Databases KDT

» Generating Concept Hierarchies from Text for Intelligence Analysis

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2002
Where	COLING
Authors	Dekang Lin, Patrick Pantel

Comments (0)