One of the key tasks for analyzing conversational data is segmenting it into coherent topic segments. However, most models of topic segmentation ignore the social aspect of conver...
Viet-An Nguyen, Jordan L. Boyd-Graber, Philip Resn...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...
This paper presents WordRank, a new page ranking system, which exploits similarity between interconnected pages. WordRank introduces the model of the ‘biased surfer’ which is ...
Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great pote...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,...