To better understand the ordering of clause aggregation operators in a text generation application, we manually annotated a small corpus. The annotated corpus supports the preferr...
This paper describes LINGUA - an architecture for text processing in Bulgarian. First, the pre-processing modules for tokenisation, sentence splitting, paragraph segmentation, par...
In this paper we address issues related to building a large-scale Chinese corpus. We try to answer four questions: (i) how to speed up annotation, (ii) how to maintain high annota...
This paper considers several important issues for monolingual and multilingual link detection. The experimental results show that nouns, verbs, adjectives and compound nouns are u...
Less than 1% of the languages spoken in the world are correctly "computerized": spell checkers, hyphenation, machine translation are still lacking for the others. In thi...