Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as docu...
In this paper we present a system that allows its use to build synthetic graphical documents for the performance evaluation of symbol recognition systems. The key contribution of ...
Mathieu Delalandre, Tony P. Pridmore, Ernest Valve...
In this paper we propose to define a measure of visual similarity to compare different pages in a corpus. This measure is based on the analysis of the visual layout saliency of th...
The Uniļ¬ed Modeling Language (UML) is the standard to specify the structure and behaviour of software systems. The created models are a constitutive part of the software speciļ¬...
Abstract. Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the ter...