Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product in...
An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPointTM or AcrobatTM ) from which they were generated. T...
Abstract--Due to the significant complexity of membrane morphology and the generally poor image quality in electron tomographic volumes, current automatic methods for segmentation ...
Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an ...