Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
During the past decade there have been significant advances in the field of Natural Language Processing (NLP) and, in particular, Information Extraction (IE) [2] which have fueled...
Kiyoshi Sudo, Amit Bagga, Lawrence O'Gorman, Jon L...
We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does ex...
Abstract. The approach of using ontology reasoning to cleanse the output of information extraction tools was first articulated in SemantiClean. A limiting factor in applying this ...
Julian Dolby, James Fan, Achille Fokoue, Aditya Ka...
Abstract. In this paper, we describe a new approach to information extraction that neatly integrates top-down hypothesis driven information with bottom-up data driven information. ...