We consider the problem of document conversion from the renderingoriented HTML markup into a semantic-oriented XML annotation defined by user-specific DTDs or XML Schema descrip...
This paper investigates methods to automatically infer structural information from large XML documents. Using XML as a reference format, we approach the schema generation problem ...
In this paper, we present a novel approach for detecting and removing pre-printed rule-lines from binary handwritten Arabic document images. The proposed technique is based on a d...
Currently, there have been several high performance OCR products for Chinese or for English. However, no one OCR technique can be simultaneously fit for both the English and the C...
This paper presents an edge-directed super-resolution algorithm for gray level document images without using any training set. This technique creates an image with smooth regions ...