Internet content today is about 80% text-based. No matter static or dynamic, the information is encoded and presented as multilingual, unstructured natural language text pages. As ...
Pavlin Dobrev, Albena Strupchanska, Galia Angelova
The Internet constitutes a potential huge store of parallel text that may be collected to be exploited by many applications such as multilingual information retrieval, machine tran...
: This paper presents interim results of an ongoing project on quality issues concerning Wikipedia. One focus of research is the relation of language and quality measurement. The o...
In this paper we present a proposal to extend WordNet-like lexical databases by adding phrasets, i.e. sets of free combinations of words which are recurrently used to express a co...
The shift from Computational Linguistics to Language Engineering is indicative of new trends in NLP. This paper reviews two NLP engineering problems: reuse and integration, while ...