Language Identification Strategies for Cross Language Information Retrieval

In our participation to the 2010 LogCLEF track we focused on the analysis of the European Library (TEL) logs and in particular we experimented with the identification of the natural language used in the queries. Language identification is in fact a key task within Cross Language Information Retrieval systems and the challenge is particularly difficult in the case of search queries where the contextual information available is scarce; function words (grammar particles highly connotative of a specific language like prepositions, pronouns, conjunctions, etc) are usually missing and the relevant presence of Named Entities can be misleading for the correct identification of the language used in the query. In order to face this challenge with acceptable performances the techniques applied should be different form the ones adopted for language guessing with more extensive and syntactically richer text fragments, like metadata or textual documents. In particular we experimented combining toget...
Alessio Bosca, Luca Dini
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2010
Where CLEF
Authors Alessio Bosca, Luca Dini
