Identifying Cognates by Phonetic and Semantic Similarity

15 years 4 months ago

Download ucrel.lancs.ac.uk

I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.

Grzegorz Kondrak

Real-time Traffic

Longest Common Subsequence | Multivalued Features Performs | NAACL 2001 | NAACL 2007 | Phonetic Similarity |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	NAACL
Authors	Grzegorz Kondrak

Comments (0)

Sciweavers

Identifying Cognates by Phonetic and Semantic Similarity

Longest Common Subsequence | Multivalued Features Performs | NAACL 2001 | NAACL 2007 | Phonetic Similarity |

Explore & Download

Productivity Tools

Sciweavers