We present new techniques for supervised wrapper generation and automated web information extraction, and a system called Lixto implementing these techniques. Our system can gener...
Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scal...
Frederick Reiss, Sriram Raghavan, Rajasekar Krishn...
Text-Mining is a growing area of interest within the field of Data Mining and Knowledge Discovery. Given a collection of text documents, most approaches to Text Mining perform kno...
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank...
Oren Etzioni, Michael J. Cafarella, Doug Downey, S...
This paper presents a method of automatically constructing information extraction patterns on predicate-argument structures (PASs) obtained by full parsing from a smaller training...