A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
The aim of this paper is twofold. We focus, on the one hand, on the task of dynamically annotating English compound nouns, and on the other hand we propose disambiguation methods ...
This paper describes an approach to providing lexical information for natural language processing in unrestricted domains. A system of approximately 1200 morphological rules is us...
In recent years, blogs and social networks have particularly boosted interests for opinion mining research. In order to satisfy real-scale applicative needs, a main task is to cre...
This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build ...