We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
In this article we describe an optical recognition system for historic lute tablature prints that we have built with the aid of the Gamera toolkit for document analysis and recogn...
We are developing a recognition system, named `Infty', for scientific documents including those with mathematical formulae. In this paper, we propose a new system that can re...
The goal to produce effective Optical Character Recognition (OCR) methods has lead to the development of a number of algorithms. The purpose of these is to take the hand-written o...
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...