Users need more sophisticated tools to handle the growing number of image-based documents available in databases. In this paper, we present a system devoted to the editing and bro...
Abstract--This paper extends the transition method for binarization based on transition pixels, a generalization of edge pixels. This method originally computes transition threshol...
This paper contains a description of the methodology and results of the three TREC submissions made by the Glasgow IR group (glair). In addition to submitting to the ad hoc task, ...
Fabio Crestani, Mark Sanderson, Marcos Theophylact...
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
Parabix (parallel bit streams for XML) is an open-source XML parser that employs the SIMD (single-instruction multiple-data) capabilities of modern-day commodity processors to del...