Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...
abstraction for modeling these problems is to view the Web as a collection of (usually small and heterogeneous) databases, and to view programs that extract and process Web data au...
We have begun a project to automatically create the lexico-syntactic resources for a microplanner as a side-effect of running a domain-specific language understanding system. The ...
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we ...
Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Als...