Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

13 years 10 months ago

Download www.ltc.amu.edu.pl

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnets in various languages in order to disambiguate the entries and attach appropriate synset ids to Slovene entries in the lexicon. Slovene lexicon entries sharing the same attached synset id were then organized into a synset. The results obtained by the different settings in the experiment are evaluated against a manually created goldstandard and also checked by hand.

Darja Fiser

Real-time Traffic