The Dutch HLT agency for language and speech technology (known as TST-centrale) at the Institute for Dutch Lexicology is responsible for the maintenance, distribution and accessib...
In this paper, we propose an algorithm and data structure for computing the term contributed frequency (tcf) for all N-grams in a text corpus. Although term frequency is one of th...
The WindowDiff evaluation measure [12] is becoming the standard criterion for evaluating text segmentation methods. Nevertheless, this metric is really not fair with regard to the...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,...
This paper describes work on Named Entity Recognition (NER), in preparation for Relation Extraction (RE), on data from a historical archive organisation. As is often the case in t...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are...
Guy De Pauw, Peter W. Wagacha, Gilles-Maurice de S...