Sciweavers

CIKM
2008
Springer

Indexing and retrieval of a Greek corpus

13 years 6 months ago
Indexing and retrieval of a Greek corpus
Greek is one of the most difficult languages to handle in Web Information Retrieval (IR) related tasks. Its difficulty stems from the fact that it is grammatically, morphologically and orthographically more complex than the lingua franca of IR, English. In this paper, we address a significant number of issues that originate from the Greek language. We use a number of techniques to determine the correct encoding that is used by web pages written in Greek. We test the effect of using a Greek stopword list in a realistic and controlled Web environment. We employ a character mapping scheme, in order to overcome the problem of the diversity of diacritics used in the language, such as accents and diaeresis. We utilize word distance and fuzzy similarity metrics in order to make up for the different forms that nouns, verbs and articles appear because of conjugations and inflections and additionally handle greeklish queries, a transliterated form of Greek. The conducted experiments present som...
Georgios Paltoglou, Michail Salampasis, Fotis Laza
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where CIKM
Authors Georgios Paltoglou, Michail Salampasis, Fotis Lazarinis
Comments (0)