Text streams are becoming more and more ubiquitous, in the forms of news feeds, weblog archives and so on, which result in a large volume of data. An effective way to explore the...
Xiang Wang 0002, Kai Zhang, Xiaoming Jin, Dou Shen
This paper describes a new versatile algorithm for correcting nonlinear distortions, such as curvature of book pages, in camera based document processing. We introduce the idea of...
—Content-based document image retrieval is a new and promising research area. Without OCR, document indexing directly based on image content is more general and convenient. Howev...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
—Handwritten text line segmentation on real-world data presents significant challenges that cannot be overcome by any single technique. Given the diversity of approaches and the...