Indexing and retrieval of a Greek corpus

15 years 7 months ago

Download www.mendeley.com

Greek is one of the most difficult languages to handle in Web Information Retrieval (IR) related tasks. Its difficulty stems from the fact that it is grammatically, morphologically and orthographically more complex than the lingua franca of IR, English. In this paper, we address a significant number of issues that originate from the Greek language. We use a number of techniques to determine the correct encoding that is used by web pages written in Greek. We test the effect of using a Greek stopword list in a realistic and controlled Web environment. We employ a character mapping scheme, in order to overcome the problem of the diversity of diacritics used in the language, such as accents and diaeresis. We utilize word distance and fuzzy similarity metrics in order to make up for the different forms that nouns, verbs and articles appear because of conjugations and inflections and additionally handle greeklish queries, a transliterated form of Greek. The conducted experiments present som...

Georgios Paltoglou, Michail Salampasis, Fotis Laza

Real-time Traffic

CIKM 2008 | Greek | Greek IR Tasks | Information Management | Web Information Retrieval |

claim paper

» Relevance feedback using semantic association between indexing terms in large free text co...

» Applying Light Natural Language Processing to AdHoc Cross Language Information Retrieval

» Extracting keysubstringgroup features for text classification

» Contentbased document routing and index partitioning for scalable similaritybased searches...

» Gridbased Indexing of a Newswire Corpus

» CIMWOS A Multimedia Archiving and Indexing System

» EuroGOV Engineering a Multilingual Web Corpus

» The anatomy of an ad structured indexing and retrieval for sponsored search

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	CIKM
Authors	Georgios Paltoglou, Michail Salampasis, Fotis Lazarinis

Comments (0)

Sciweavers

Indexing and retrieval of a Greek corpus

CIKM 2008 | Greek | Greek IR Tasks | Information Management | Web Information Retrieval |

Explore & Download

Productivity Tools

Sciweavers