Text extraction in complex color documents

10 years 3 months ago
Text extraction in complex color documents
Text extraction in mixed-type documents is a pre-processing and necessary stage for many document applications. In mixed-type color documents, text, drawings and graphics appear with millions of di erent colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method to automatically detect and extract text in mixed-type color documents is presented. The proposed method is based on a combination of an adaptive color reduction (ACR) technique and a page layout analysis (PLA) approach. The ACR technique is used to obtain the optimal number of colors and to convert the document into the principal of them. Then, using the principal colors, the document image is split into the separable color plains. Thus, binary images are obtained, each one corresponding to a principal color. The PLA technique is applied independently to each of the color plains and identi
Charalambos Strouthopoulos, Nikos Papamarkos, Anto
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where PR
Authors Charalambos Strouthopoulos, Nikos Papamarkos, Antonios Atsalakis
Comments (0)