Sciweavers

MVA
1992

Separation of Textual and Non-textual Information within Mixed-Mode Documents

13 years 5 months ago
Separation of Textual and Non-textual Information within Mixed-Mode Documents
An increasing number of comfortable publishing systems nowadays leads to documents containing more than just textual information. Graphics and images are combined with text and often overlap one another. In this paper we present a robust algorithm for separating textual information from nontextual within multi-mode documents without recognizing individual characters. The approach generates connectcd componenLs and classifies them as text or non-text. As result, a credibility for each connected component is calculated which expresses its similarity to text or graphics. Moreover, strings are generated that represent sequences of connected components classified as text. Strings can be aligned in any direction. The main processing steps of oru system are connected component generation, neighborhood analysis, and the generation of strings.
Frank Hönes, Rainer Zimmer
Added 07 Nov 2010
Updated 07 Nov 2010
Type Conference
Year 1992
Where MVA
Authors Frank Hönes, Rainer Zimmer
Comments (0)