Sciweavers

SIGIR
2005
ACM

Bootstrapping dictionaries for cross-language information retrieval

13 years 10 months ago
Bootstrapping dictionaries for cross-language information retrieval
The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages. We here introduce a methodology by which multilingual dictionaries (for Spanish and Swedish) emerge automatically from simple seed lexicons. These seed lexicons are automatically generated, by cognate mapping, from (previously manually constructed) Portuguese and German as well as English sources. Lexical and semantic hypotheses are then validated and new ones iteratively generated by making use of co-occurrence patterns of hypothesized translation synonyms in parallel corpora. We evaluate these newly derived dictionaries on a large medical document collection within a cross-language retrieval setting. Categories and Subject Descriptors H.3.1 [Content Analysis and Indexing]: Dictionaries, Thesauruses; H.3.3 [Information Search and Retrieval]: Retrieval models General Terms Algorithms Keywords Cross-Language Information Retrieva...
Kornél G. Markó, Stefan Schulz, Olen
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where SIGIR
Authors Kornél G. Markó, Stefan Schulz, Olena Medelyan, Udo Hahn
Comments (0)