This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed se...
In order to solve problems of reliability of systems based on lexical repetition and problems of adaptability of languagedependent systems, we present a context-based topic segmen...
In many applications, replacing a complex word form by its stem can reduce sparsity, revealing connections in the data that would not otherwise be apparent. In this paper, we focu...
Shane Bergsma, Aditya Bhargava, Hua He, Grzegorz K...
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragment...
George Tsatsaronis, Iraklis Varlamis, Michalis Vaz...
Abstract: In this paper we describe a flexible, portable and languageindependent infrastructure for setting up large monolingual language corpora. The approach is based on collecti...
Christian Biemann, Stefan Bordag, Gerhard Heyer, U...