Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
We consider the problem of extracting specified types of information from natural language text. To properly analyze the text, we wish to apply semantic (selectional) constraints ...
In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the...
In this paper a morphological tagging approach for document image invoice analysis is described. Tokens close by their morphology and confirmed in their location within different ...
The system presented in this paper finds images and line-drawings in scanned pages; it is a crucial processing step in the creation of a large-scale system to detect and index ima...