Graphical components information extraction is a crucial step in the chart recognition and understanding process. However, existing methods of information extraction from chart im...
We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does ex...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
With an aim to extract the structural information from the table of contents (TOC) to help develop digital document library the requirement of identifying/segmenting the TOC page ...
S. Mandal, S. P. Chowdhury, Amit Kumar Das, Bhabat...
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...