This research explores the interaction of textual and photographic information in document understanding. The problem of performing generalpurpose vision without apriori knowledge...
Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...
Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...