Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

11

ACL
1993

favoriteEmaildiscussreport

96views Computational Linguistics» more ACL 1993»

Distributional Clustering of English Words

13 years 5 months ago

Distributional Clustering of English Words

Download acl.ldc.upenn.edu

We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases, the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters: as the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Fernando C. N. Pereira, Naftali Tishby, Lillian Le

Real-time Traffic

ACL 1993 | ACL 2007 | Average Context Distributions | Particular Syntactic Contexts | Relative Frequency Distributions |

claim paper

Related Content

» A Named Entity Labeler for German Exploiting Wikipedia and Distributional Clusters

» Unsupervised PartofSpeech Acquisition for ResourceScarce Languages

» A Quantitative Model of Word Order and Movement in English Dutch and German Complement Con...

» Improved Unsupervised POS Induction through Prototype Discovery

» Automatically Discovering Word Senses

» Specification in context devoicing processes in Polish French american English and German...

» On the Semiautomatic Generation of WordNet Type Synsets and Clusters

» Clustering Using Feature Domain Similarity to Discover Word Senses for Adjectives

» Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency

Post Info
More Details (n/a)

Added	02 Nov 2010
Updated	02 Nov 2010
Type	Conference
Year	1993
Where	ACL
Authors	Fernando C. N. Pereira, Naftali Tishby, Lillian Lee

Comments (0)