We present a system that classifies pixels in a document image according to marking type such as machine print, handwriting, and noise. A segmenter module first splits an input ...
We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invari...
Two-dimensional (2-D) plots in digital documents contain important information. Often, the results of scientific experiments and performance of businesses are summarized using pl...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...
DjVu is an image compression technique specifically geared towards the compression of scanned documents in color at high resolution. Typical magazine pages in color scanned at 300...
This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods tha...