Search Sciweavers | Sciweavers

18

AAAI
2007

135views Intelligent Agents» more AAAI 2007»

Template-Independent News Extraction Based on Visual Consistency

13 years 7 months ago

Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...

Shuyi Zheng, Ruihua Song, Ji-Rong Wen

claim paper

Read More »

17

click to vote

CACM
1998

110views more CACM 1998»

Viewing WISs as Database Applications

13 years 5 months ago

Download www.cs.toronto.edu

abstraction for modeling these problems is to view the Web as a collection of (usually small and heterogeneous) databases, and to view programs that extract and process Web data au...

Gustavo O. Arocena, Alberto O. Mendelzon

claim paper

Read More »

14

click to vote

INLG
2010
Springer

123views Natural Language Processing» more INLG 2010»

'If you've heard it, you can say it' - Towards an Account of Expressibility

13 years 3 months ago

Download www.aclweb.org

We have begun a project to automatically create the lexico-syntactic resources for a microplanner as a side-effect of running a domain-specific language understanding system. The ...

David McDonald, Charlie Greenbacker

claim paper

Read More »

17

click to vote

ICDM
2006
IEEE

164views Data Mining» more ICDM 2006»

Unsupervised Learning of Tree Alignment Models for Information Extraction

13 years 11 months ago

Download users.soe.ucsc.edu

We propose an algorithm for extracting ﬁelds from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...

Philip Zigoris, Damian Eads, Yi Zhang

claim paper

Read More »

16

click to vote

ACL
2010

128views Computational Linguistics» more ACL 2010»

Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

13 years 3 months ago

Download nlp.stanford.edu

We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we ...

Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Als...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers