This paper describes the design and collection of NameDat, a database containing English proper names spoken by native Norwegians. The database was designed to cover the typical a...
Abstract. Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A de...
This paper proposes a method for extracting bilingual text pairs from a comparable corpus. The basic idea of the method is to apply bootstrapping to an existing corpusbased cross-...
Hiroshi Masuichi, Raymond Flournoy, Stefan Kaufman...
We describe a language-independent, flexible, and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as...
Abstract. In this paper we propose an approach to deal with the ChineseEnglish cross-language image retrieval problem. Text-based image retrieval and query translation methods were...