Sciweavers

25 search results - page 2 / 5
» Deriving link-context from HTML tag tree
Sort
View
AAAI
2007
13 years 7 months ago
Template-Independent News Extraction Based on Visual Consistency
Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen
CACM
1998
110views more  CACM 1998»
13 years 5 months ago
Viewing WISs as Database Applications
abstraction for modeling these problems is to view the Web as a collection of (usually small and heterogeneous) databases, and to view programs that extract and process Web data au...
Gustavo O. Arocena, Alberto O. Mendelzon
INLG
2010
Springer
13 years 3 months ago
'If you've heard it, you can say it' - Towards an Account of Expressibility
We have begun a project to automatically create the lexico-syntactic resources for a microplanner as a side-effect of running a domain-specific language understanding system. The ...
David McDonald, Charlie Greenbacker
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
13 years 11 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
ACL
2010
13 years 3 months ago
Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we ...
Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Als...