Motivated by current efforts to construct more realistic spam filtering experimental corpora, we present a newly assembled, publicly available corpus of genuine and unsolicited (s...
Parallel corpora are indispensable resources for a variety of multilingual natural language processing tasks. This paper presents a technique for fully automatic construction of c...
This paper investigates the use and the prediction potential of semantic similarity measures for automatic generation of links across different documents and passages. First, the ...
An interactive track was included in INEX for the first time in 2004. The main aim of the track was to study the behaviour of searchers when interacting with components of XML doc...
An approach for segmenting handwritten text in a Pre-Hospital Care Report (PCR) is presented. Segmentation of lines and words in a PCR is extremely challenging due to the nature o...