Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...
Abstract. The aim of our work is to develop a flexible and powerful Knowledge Acquisition framework that allows users to rapidly develop Natural Language Processing systems, includ...
The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web infor...
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Y...
In the context of ontology-based information extraction, identity resolution is the process of deciding whether an instance extracted from text refers to a known entity in the tar...
The Word Wide Web has becoming one of the most important information repositories. However, information in web pages is free of standards in presentation, without being organized i...