Many approaches to Information Extraction (IE) have been proposed in literature capable of finding and extract specific facts in relatively unstructured documents. Their applicatio...
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scen...
We present a system that classifies pixels in a document image according to marking type such as machine print, handwriting, and noise. A segmenter module first splits an input ...
We present a framework for active learning in the multiple-instance (MI) setting. In an MI learning problem, instances are naturally organized into bags and it is the bags, instea...
Most text categorization methods require text content of documents that is often difficult to obtain. We consider "Collaborative Text Categorization", where each document...