Sciweavers

LREC
2010

Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure

13 years 6 months ago
Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word in one language, the thesaurus suggests words in a second language that are semantically related. The method requires two monolingual corpora and a basic dictionary. Our general approach is to build two monolingual word graphs, with nodes representing words and edges representing linguistic relations between words. A bilingual dictionary containing basic vocabulary provides seed translations relating nodes from both graphs. We then use an inter-graph node-similarity algorithm to discover related words. Evaluation with three human judges revealed that 49% of the English and 57% of the German words discovered by our method are semantically rel...
Lukas Michelbacher, Florian Laws, Beate Dorow, Ulr
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Lukas Michelbacher, Florian Laws, Beate Dorow, Ulrich Heid, Hinrich Schütze
Comments (0)