In this paper we explore the effectiveness of three clustering methods used to perform word image indexing. The three methods are: the Self-Organazing Map (SOM), the Growing Hiera...
Automatic Term Recognition (ATR) is concerned with discovering terminology in large volumes of text corpora. Technical terms are vital elements for understanding the techniques us...
This paper presents an edge-directed super-resolution algorithm for gray level document images without using any training set. This technique creates an image with smooth regions ...
Different dialects of XML have emerged as ubiquitous document exchange formats. For effective collaboration based on such documents, the capability to propagate edit operations pe...
Abstract. This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled...