This paper describes our participation in the TREC Legal experiments in 2007. We have applied novel normalization techniques that are designed to slightly favor longer documents i...
— For Optical Character Recognition (OCR) of bilingual or multilingual document containing text words in regional language and numerals in English, it is necessary to identify di...
Documents often have inherently parallel structure: they may consist of a text and ries, or an abstract and a body, or parts presenting alternative views on the same problem. Reve...
In this paper, we introduce an alpha-numerical sequences extraction system (keywords, numerical fields or alpha-numerical sequences) in unconstrained handwritten documents. Contra...
In this paper we present a method for detecting the text genre quickly and easily following an approach originally proposed in authorship attribution studies which uses as style m...
Efstathios Stamatatos, Nikos Fakotakis, George K. ...