Sciweavers

COLING
2000

Layout and Language: Integrating Spatial and Linguistic Knowledge for Layout Understanding Tasks

13 years 5 months ago
Layout and Language: Integrating Spatial and Linguistic Knowledge for Layout Understanding Tasks
Complex documents stored in a flat or partially marked up file format require layout sensitive preprocessing before any natural language processing can be carried out on their textual content. Contemporary technology for the discovery of basic textual units is based on either spatial or other content insensitive methods. However, there are many cases where knowledge of both the language and layout is required in order to establish the boundaries of the basic textual blocks. This paper describes a number of these cases and proposes the application of a general method combining knowledge about language with knowledge about the spatial arrangement of text. We claim that the comprehensive understanding of layout can only be achieved through the exploitation of layout knowledge and language knowledge in an inter-dependent maimer.
Matthew Hurst, Tetsuya Nasukawa
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where COLING
Authors Matthew Hurst, Tetsuya Nasukawa
Comments (0)